EcoG cluster aiding high-energy physics research

04.20.11 -

When Illinois students worked with faculty and NCSA staff to build one of the world's most energy-efficient supercomputers last fall, they hoped the system could be used to speed up some real scientific calculations once their benchmarking work was done. And in the months since they picked up their Green500 awards at SC10, their hopes have been realized, with post-doc Aaron Torok from Indiana University extensively using "EcoG" for his particle physics research.

Torok works with Indiana Distinguished Professor Steven Gottlieb, one of the key developers and major users of the lattice quantum chromodynamics (QCD) code called MILC. Lattice QCD is a method for studying quarks and gluons, subatomic particles that comprise some of the basic building blocks of our universe. In lattice QCD, the continuum space-time is approximated as a grid or lattice of points. The quark variables are placed on the grid points and the gluon variables are defined on the links joining grid points. By doing calculations with a variety of small grid spacings, physicists can understand what happens to these tiny, mysterious particles in the continuum limit.

In August 2009, Gottlieb came to NCSA for a sabbatical, bringing Torok along. The sabbatical accelerated the researchers' collaboration with the center's Innovative Systems Laboratory, with Gottlieb and Torok meeting weekly with ISL's Volodymyr Kindratenko and Guochun Shi to work on porting MILC to graphics-processing units (GPUs). With their hundreds of small, simple cores focused on computation (with little support for I/O devices, interrupts, and complex assembly instructions), GPUs can be used to dramatically speed up many science and engineering codes. Of course, not every application is a good fit for GPUs, and even for those that are, achieving speedups requires an investment in re-programming.

In MILC, the conjugate gradient algorithm (CG) is responsible for about 60 percent of the processing time, so it became the primary target for accelerating MILC. A group at Boston University previously had written a solver in C for CUDA for Wilson-type quarks, but the MILC collaboration specializes in Kogut-Susskind or staggered quarks. Shi extended the code developed at BU to staggered quarks. (ISL co ntinues to collaborate with BU, Harvard, and Jefferson Lab on adapting lattice QCD code for GPUs).

The next step was to try the new code on a single GPU as a test case. "We knew MILC is memory bandw idth bound, and we knew GPUs have great device bandwidth, but we didn't know if it was going to work!" Shi said. "It's trial and error."

That initial effort was a success, with the code running about 100 times faster on a single GPU than it could on a single CPU core. Even comparing a single GPU to a multi-core CPU node in multi-GPU run s, the CG speedup was still close to 10x—enhanced performance that was well worth the development effort. The next step was adapting the code to run on multiple GPUs, and now Shi is working on scalin g the code up to run on 100 or more GPUs. The GPU code, called QUDA, is freely available through the SciDAC QUDA libraries.

As the GPU efforts were going on, Torok was working on adding electromagnetic effects to MILC to make the simulations "closer to what really happens in nature."

"It just so happened that the MILC electromagnetic code spent almost all of its time in the conjugate gradient routine, which is the one that Guochun ported first, therefore making the E&M applicat ion a perfect testbed for the GPU code," Torok says. So as Torok began using EcoG for his research this winter, he was able to provide Shi with valuable feedback on the code. "It's really robust code now, because it's gone back and forth so many times with a real application," he says.

And that feedback also gives Shi the satisfaction of knowing his work is advancing real research. "It's a real impact. It's real stuff."

Torok also uses the Lincoln cluster at NCSA and the Longhorn system at the Texas Advanced Computing Center, but for smaller lattice sizes, EcoG is a great, no-waiting option.

"I've seen him filling the whole queue. He's probably run a 1,000 jobs on EcoG," Shi says. "I doubt Aaron could get that much computation time otherwise."

Work on porting MILC to GPU has been supported by the Illinois Institute for Advanced Computing Applications and Technologies.

Torok's work is supported by grants from the National Science Foundation and the Department of Energy.