A workhorse retires | National Center for Supercomputing Applications at the University of Illinois

A workhorse retires

09.17.10 -

After providing millions of compute hours over six and one half years of faithful service, NCSA's Mercury cluster retired at the end of March.

Described by many as "a true workhorse" of a machine, it was used by scientists studying everything from mesoscale thunderstorms to the most minuscule atoms. And reliable it was, up and running about 98 percent of the time. The few times it was down were often owing to circumstances beyond anyone's control.

NCSA's Dan Lapine was responsible for keeping the Mercury cluster operational from its first compute hour to its last. He recalls the time a few years ago when NCSA was working with the San Diego Supercomputer Center, Caltech, and the Southern California Earthquake Center through the TeraGrid program. Supercomputers were sharing data cross-country through a single filesystem, modeling earthquakes and their impact. The sites were connected by a 40 gigabit per second network connection, one of a very few like it in the world.

The only problem? Networks aren't the only things that travel cross-country. Trains do too.

"Twice—not once—twice, we had train derailments in Colorado where the train tracks parallel the lines that do the data, and the train cut the cable. So we had our file systems crash because of a train derailment in Colorado. A really good example of how our supercomputing and its interruptions don't necessarily depend on us," says Lapine.

Mercury was among the first supercomputers based on the Intel Itanium chip. In fact, one of the test clusters that preceded Mercury had included Itanium chips with serial numbers 1 and 2.

"It was a bit of a risk. No one else was doing it at the time," says Lapine. "It allowed us to have more memory available for each machine and a faster clock speed. But a lot more performance every time the computer would do something. ...When we put the system together, we actually were the 15th fastest computer in the world."

But as technology advanced in the supercomputing world and faster systems arrived on the machine room floor, the goals for Mercury changed. "Over time, the need for Mercury to be the most leading edge became less and less and the need for consistency and stability became more and more," says Lapine. "So being able to keep the machine available 98 percent of the time—I'm quite happy with that."

More than 1,000 researchers had accounts on the system, and more than 2 million jobs were run. Jobs for researchers like Julio Facelli, a professor at the University of Utah who used Mercury to predict crystal structures for organic molecules that are frequently used in pharmaceuticals, fertilizers, and explosives. And for Georgia Tech's Marilyn Smith, who relied on Mercury to model the aerodynamic effects of wind turbines. And for the University of Illinois' Klaus Schulten who depended on Mercury for many of his projects, including researching DNA and gene expression.

These researchers are among the hundreds of Mercury users who conducted transformative research and published papers on the results. For instance, Gautam Ghosh of Northwestern University was one of Mercury's first users. He published nine papers in four years based on work conducted on Mercury. The University of Illinois' Roman Boulatov and his team published eight. In fact, hundreds of papers can be traced back to Mercury's compute power. Not a bad legacy for a revolutionary workhorse of a supercomputer.