Hey, I have worked on this exact machine before, neat to see they are finally decommissioning it. It would be a terrible purchase to actually use these days though, for the cost of moving and deploying it you could rock a few Hopper or Grace clusters that would outperform the cluster for less than half of the operating overhead.
I fully expect it to get parted out, the actual components would be far more useful on their own as cheap homelab systems, and would be a much better ROI versus using it as is. This thing is water cooled, just the plumbing would be a nightmare to deal with if you aren't set up for it, and if you are you would be better off going with a modern architecture anyway.
We were running meteorological models mostly, but I did have a colleague that was trying to use it to predict wildlife migratory patterns using topographical mapping. It was batched out on a few projects at any given time while I was there, it was essentially timeshares between a few different research departments.
Damn that’s crazy. When I was just out of college I built the touchscreen web app that promoted this thing in the lobby of UCAR. Looks like it’s still running for now: https://hpctv.ucar.edu
It's kind of lame that they need to junk the entire apparatus after only a decade. I get that processor technology moves on apace but we already know it does that so why doesn't a universal architecture exist where nodes can be added at will?
It's more of an operating cost issue. It's almost decade-old hardware. It was efficient in its day, but compared to new hardware it just costs so much to run you would be better served investing in something with modern efficiency. It won't be junked, it will be parted out. If you are someone that wants a cheap homelab with infiniband and shitloads of memory you could pick up a blade for a fraction of what it would otherwise cost. I fully expect it to turn into thousands of reasonably powerful servers for the prosumer and nerd markets instead of running as a monolithic cluster.
One of the reasons why I work in industrial controls. A good day is me sneaking in tech that came after the year 2000. Employment for life and I get to branch out to related stuff. Employer is paying me to take ME and chem-e classes now.
I don't know why anyone would spend their life chasing the newest fad tech when you can pick a slow moving one, master it, and master the ones around it. Would much rather be the person who knew how the entire system works vs knowing the last 8 programming languages/frameworks only 1 of which is relevant.
But hey glad there are people who decide on that lifestyle I like having a better cellphone every year.
If you have too many "slow" modes in a super computer you'll hit a performance ceiling where everything is bottle necked by the speed of things that are not the CPU: memory, disk for swap, and network for sending partial results across nodes for further partial computing.
Source: I've hang up too much around people doing PhD thesis in these kinds of problems.
I would imagine it's very difficult to make a universal architecture but if I have learnt anything about computers it's that the manufacturers of software and hardware deliberately created opaque and monolithic systems, e.g. phones. They cynically insert barriers to their reuse and redeployment. There's no profit motive for corporations to make infintitely scalable computers. Short sighted greed is a much more plausible explanation.
Seems like it's cheap to start the bidding at $2500 but the cheapest thing is probably the initial purchase price after moving it, buying the needed cabling, and electricity bills.
I bet manpower costs are significant as well. How many people are needed to run this thing? You probably need engineers with an esoteric set of skills to put it back together and manage it which would not be cheap.
Edit: I looked it up, it is running SUSE Enterprise Linux, so maybe management isn't as specialized as I expected.
It's all Broadwell Xeons. Sure, there's 8000 of 'em, but after you factor in purchase price, moving and storage costs, time spent parting out nodes, shipping costs, etc... I think you'd have a hard time breaking even, and for an end user you can get like 4x the FLOPS per socket at half the power consumption with current server CPUs.