The Supercomputer Down the Hall: A Journey into the Guts of Goddard’s Discover Supercomputing Cluster
Have you ever seen a supercomputer? Do you know how one works?
I got a chance to look a supercomputer in the face recently, when I took an employee tour of the Discover supercomputer at Goddard Space Flight Center. It’s literally down the hall from me. I just never got a chance to see it up close since I started working here almost a year ago. Discover is the workhorse computing resource for the NASA Center for Climate Simulation.
It’s a pretty impressive gadget. Walking between the metal racks packed with equipment, multicolored blinky lights aglow, I thought of a famous scene in 2001: A Space Odyssey. The spaceship’s supercomputer, HAL, has gone all homicidal on the crew, so astronaut Dave Bowman climbs into its brain and starts to unplug stuff. Famously, this reduces the paranoid evil genius HAL to the level of a blubbering toddler singing “Daisy.”
Blogolicious Supercomputer Facts
Goddard Space Flight Center’s Discover supercomputer can perform approximately 159 trillion calculations per second. The supercomputer consists of:
- 14,968 processors
- 12,904 memory modules
- 35,608 gigabytes of random-access memory
- 3,120 hard drives
- 5 miles of copper cables
- 6 miles of fiber-optic cables
I would bet that if you asked 10 people on the street to draw a supercomputer, they would produce something like HAL’s nerve center — a softly humming, dimly glowing cybercave.
Or, they might sketch something like ENIAC, the Electronic Numerical Integrator And Computer. Eighty feet long and weighing 27 tons, ENIAC contained more than 17,000 vacuum tubes.
To make computers really fast in those days, you had to place their various components close together so the electrical signals wouldn’t have to travel too far. Each “trip” meant a tiny delay. Many, many delays add up to a computing traffic jam.
These days, it’s different. Supercomputers like Discover are essentially collections of many, many smaller-scale computing devices working in parallel to solve big tasks.
They are not necessarily in the same place, either. Discover’s machinery is spread across several rooms, connected by a high-speed data network. People can network into the system from across the country via data superhighways.
Now I’m going to talk some tech. And I’m going to be disgustingly precise about it. Supercomputer people talk nodes, processors, cores, and teraflops. It’s notoriously confusing, but you have to understand these terms to really get supercomputing. So here we go . . .
The functional unit of Discover is the processor, just like in your desktop PC or laptop (or iPhone or whatever). The processor is a little brain on a silicon chip. It does the number-crunching.
Waaayyyy back in the day — like, before 2005! — the motherboard of your computer sported a single processor on a single chip. If you wanted more processing power, you had to add more chips.
Not anymore. Now the little brain in your computer has multiple Central Processing Units (CPUs), or “cores,” working in parallel. The processor in my Mac Book Pro, for example, contains two cores. It’s an Intel Core 2 Duo. Both cores reside on the same chip, the same little slab of silicon.
So, are you still with me?
The Discover supercomputer uses dual-core and quad-core processors. In other words, each slab of silicon hosts two cores or four cores. For the ubergeeks in the house, the brand name of the latest processor is Intel Xeon Nehalem. (And yes, you can buy personal computers with this processor — the Mac Pro 2.66 GHz workstation, for example.)
Discover uses about 15,000 cores to crunch data. The cores exist within racks and racks of gizmos called nodes.
Each node has two Xeon Nehalem processors, for a total of either four or eight cores. So each node is equivalent to a really, really fast desktop computer, something with twice the horsepower of the aforementioned Mac Pro workstation. Each node has a hard drive for its operating system software as well as network interfaces for moving data in and out.
So what does this all mean? It means that the supercomputer at the heart of climate and weather science at NASA Goddard runs on the same kind of processors found in personal computers — perhaps yours.
The processors work in parallel, like an army of workers digging a canal with shovels. Each processor lifts a shovelful of data at a time, but if you have a lot of shovels, you end up with the Panama Canal.
Of course, the thousands of workers also need life support, like shelter, food, and water. In supercomputing terms, that means electricity and cooling systems to carry waste heat away from the processors.
A lot of clever engineering went into packing Discover into a couple of rooms. For example, the back doors of the equipment racks have heat-sucking radiators built into them. The radiators are hooked up to Goddard’s chilled water system. Having multiple cores on the same chip reduces the hardware required to prevent a cybermeltdown.
Although right now Discover crunches with 15,000 cores, a planned upgrade will bring it to around 29,000. And what does this all buy you? About 160 teraflops of computing power for the moment.
A teraflop is one trillion floating point operations per second. Flops measure the computing horsepower of a system, its ability to crunch numbers. Add two numbers in your head: you have just completed one floating point operation.
So what is 160 teraflops?
Get the entire world population to add two numbers every second for 5 hours and 20 minutes. That’s 160 teraflops!
OH AND DID I MENTION? All opinions and opinionlike objects in this blog are mine alone and NOT those of NASA or Goddard Space Flight Center.