Streamlining simulation

12.04.14 -

The first fully functional version of CyberShake ran on 800 processors on NCSA’s Mercury in 2008. Six years and a few versions later, the SCEC team scaled up the code to run on 295,040 processors on Blue Waters in early 2014 (the run was dubbed CS14.2). The net effect was that the simulations ran about 200 times faster and created more than 600 times the data. The simulations finished in weeks instead of months, which accelerates code development and allows resolution to approach the fineness that engineers and seismic hazard assessments need. This most recent experiment included 1,144 sites in Southern California.

Porting the full workflow to Blue Waters would not have been possible without the support of NCSA’s Omar Padron and Bill Kramer, says Maechling. In addition to Blue Waters’ outstanding specifications, implementing a more efficient workflow was critical to the project’s success.

The workflow for CS14.2 involved three programs: the Unified Community Velocity Model (UCVM), an Anelastic Wave Propagation model (AWP-ODC), and SeisSynth. UCVM and AWP-ODC ran one and two jobs per site, respectively. SeisSynth ran 415,000 jobs per site, one for each possible fault rupture scenario, or a total of about 235 million very short serial jobs that generate possible seismograms for each site. SeisSynth required the new workflow that used a single batch submission to run these millions of very small jobs. This change, along with integration of GPUs in the AWP-ODC step, reduced the application makespan (the time the program is actually running) for 1,144 sites from 1,467 hours to 342 hours in less than one year.

Automating the calculations as much as possible through scientific workflows is a big step toward using the CyberShake model in an operational setting, says Maechling.

“Any step that involves a human quickly becomes a bottleneck when scaling up,” says Scott Callaghan, a research programmer at SCEC. “We executed almost 100 million jobs over two weeks, and at peak were using 9,220 nodes, all of which was managed via the scientific workflow middleware that we use.”

SCEC’s PressOn success using a complex scientific workflow inspires others in the supercomputing industry to pursue a similar process, says Omar Padron, a research programmer at the National Center for Supercomputing Applications. He says these efforts range from improving support for workflows on supercomputers like Blue Waters to developing best practices for projects looking into using workflows.

As computers get bigger going forward, Maechling thinks workflows will become more important in supercomputing.

“They enable groups like us to run applications at a scale that would be impossible otherwise,” says Callaghan.

National Science Foundation

Blue Waters is supported by the National Science Foundation through awards ACI-0725070 and ACI-1238993.