Workload study: Blue Waters enables large-scale science

03.31.17 -

A technical report analyzing use and performance of NCSA's Blue Waters supercomputer and all the scientific applications it has run—from its launch in April 2013, until September 2016—shows Blue Waters has spent the majority of its computing time solving large-scale scientific applications. These include projects like understanding the 160-million-atom flu virus capsid, or creating high resolution 3D maps of the Arctic from massive amounts of satellite data. The paper also shows many of these large applications could only be performed on Blue Waters.

Blue Waters is housed at the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign and is the largest leadership-class supercomputer funded by the National Science Foundation.

The supercomputer is often referred to as a leadership machine because it pushed the boundaries of supercomputing when it was first deployed, and attempted a unique hardware architecture focused on balance that still rivals newer systems. The report shows how different communities and fields of science have made effective use of the system since full service began in 2013, providing insight to the supercomputing community on the benefits of some of the hardware decisions the Blue Waters project made.

"The goal three years ago was to make a machine that could perform data-focused and large-scale scientific applications, as well as jobs that might have different needs, like model refinement and simulations," said Bill Kramer, principal investigator of the Blue Waters project, adding that for balance, the supercomputer has significant investments in memory and network bandwidth, as well as enough processors to have a sustained performance of one quadrillion calculations per second.

The paper, funded by the NSF, was published by a collaboration of the research team at the The University at Buffalo's Center for Computational Research (CCR) and the Blue Waters project at NCSA. It makes use of Blue Waters' advanced Cray monitoring system with analysis software from Sandia National Laboratories. NCSA and Buffalo researchers collected and analyzed almost 100 TB of use and performance data as well as insight from science teams. The Buffalo team enhanced the open source XDMod (XD Metrics on Demand) system to ingest and analyze the data.

The monitored data was recorded prolifically over the past three years, making this workload study highly detailed. Other supercomputers typically do not monitor their scientific workloads in the same detail due to the challenges of preventing application performance impacts and the need to do major amounts of computation for the analysis. The Cray hardware in Blue Waters has a built-in monitoring system that eliminates any impact on the science applications it monitors, allowing the Blue Waters team to build-out a treasure trove of data not only for workload studies, but for vital troubleshooting when helping scientists. Hundreds of nodes on Blue Waters were used to analyze the data for the workload report.

Blue Waters is the only system that can handle certain large scientific applications

"From year one to year three, the workload study shows an increase in data-focused applications while at the same time, the large scale simulations continued to increase in performance and effectiveness. Both by using a large number of nodes, and/or a large amount of memory, but in different ways," Kramer said.

This binned scatter plot below shows memory usage of science applications run on the XE nodes. The job size by node count is plotted on the x-axis and total peak memory usage of the job on the y-axis. The color of each bin indicates the total wall hours for jobs in the bin on a log scale. For example, a point on the top right of the plot indicates that one or more jobs ran with approximately 22,600 nodes and used about 1.4 PB of memory, or about memory 97% of the entire memory in every node.

binned scatter plot showing memory usage of science applications run on Blue Waters XE nods

"The points that are to the right of 11,300 nodes and above 800 TB could not be executed on any other open system in the U.S. They couldn't scale any farther to the right, because there just isn't a system that can take on jobs of those size," said Kramer.

One of the projects in the area where no other supercomputer would work was a project led by Carnegie Mellon University astronomer Tiziana Di Matteo. While it wasn't her first simulation on a leadership class supercomputer, it was her most detailed, allowing her to see the first quasars in her simulation of the early universe.

"The Blue Waters project," DiMatteo wrote in a Blue Waters report, "made possible this qualitative advance, making possible what is arguably the first complete simulation (at least in terms of the hydrodynamics and gravitational physics) of the creation of the first galaxies and large-scale structures in the universe."

Study shows accelerator nodes are being used and demand is increasing

Blue Waters has 22,640 XE6 compute nodes, which contain two CPU processor modules each. It also has more than 4,224 XK7 nodes, each of which contain one CPU processor module, and one GPU "accelerator." Accelerators—as this video from NVIDIA and "Mythbusters" explains—have thousands more processing units, or cores, than CPU processors, but need to be programmed with different methods.

The XK7 nodes are as heavily utilized as the XE nodes but relatively a small number of applications can use them well. But, over the past three years, the number of science teams using the XK nodes has steadily increased.

Demand is increasing for accelerator nodes, Kramer says, because of the work of the Blue Waters project's Petascale Application Improvement Discovery (PAID) program. This program funds multiple teams to help interested scientists adjust their code to work on accelerator nodes, among other things.

Many scientific applications have yet to rework their codes to work on GPU nodes, but the workload study found five of the larger projects Blue Waters has run used them. A diverse number of smaller groups used the accelerator nodes as well.

"When the system was originally configured, it was not clear what balance of CPU or GPU should be in the system. We set the ratio based on analysis of the science teams approved to use Blue Waters and consultation with accelerated computing experts," said Greg Bauer, applications technical program manager at NCSA. "The workload study shows the balance we went with is very reasonable, and that we were ready to keep up with the demand for the first three years."

National Science Foundation

Blue Waters is supported by the National Science Foundation through awards ACI-0725070 and ACI-1238993.