Pushing limits

05.28.15 -

NCSA's Private Sector Program (PSP) has a unique opportunity to work with non-academic codes, impact economic development, and make more complex simulations happen for industry partners, says Merle Giles, PSP director. High-performance computing (HPC) has become a core strategic technology enabling enhanced insight into product performance and improving the productivity by considering more design variants. But as companies increasingly seek to minimize time, quality, and cost pressures by using engineering simulation, they have been constrained by compute power. Both software developers and end-users face constraints when it comes to testing the limits of codes. They often don't have access to truly massive supercomputers and are focused on daily business needs, so they can't spare the time and manpower to attempt extreme scaling studies. PSP bridges the gap between partnering companies and computing resources and expertise.

The PSP technical staff focus on scaling the codes, both commercial and open-source, that companies report as being most important to them. The goal: to identify where the code needs to improve and where the science needs to improve to speed up processing. The result? Massive improvements over prior runs.

Fluent record

One such improvement was with ANSYS® Fluent®. NCSA worked with ANSYS to scale Fluent to 36,000 compute cores—an industry first that could lead to greater efficiencies and increased innovation throughout manufacturers' product development processes. Fluent is one of the world's most-used commercial computational fluid dynamics (CFD) simulation codes.

"We're connecting all the dots," said Ahmed Taha, the NCSA senior computational resource coordinator who led this extreme benchmarking project. "NCSA is unique in connecting the industrial users, the hardware and software vendors, and the domain expertise of our staff. In addition, this level of scalability for a commercial fluid dynamics solver is unprecedented on our system, especially considering the complexity of the model physics with transient, turbulent flow, chemical species transport, and multiple non-reacting flows."

“Compute power has increased a thousand-fold over the last decade, enabling engineers to solve problems that were once unsolvable," said Wim Slagter, ANSYS HPC product manager. "While most organizations don't have access to 36,000 cores today, it won't be long before these extreme core counts are commonplace. And even today's users who are running at much lower core counts will see direct benefits through considerably greater efficiencies. The results will be more amazing products delivered to customers much faster than ever."

ANSYS and NCSA continue exploring the limits of scale-out computing, including testing and improving the scaling of models involving even more complex physics. Other collaborative efforts include running Fluent on NVIDIA graphics processing units (GPUs), as well as testing the supercomputing limits for applications such as turbomachinery using ANSYS CFX®.

Engineering feat

LS-DYNA, an explicit finite element code used for simulations in the auto, aerospace, manufacturing, and bioengineering industries, was scaled to 15,000 cores on Blue Waters—a world record for scaling of any commercial engineering code.

"Once Blue Waters was in production, we looked for test cases to run at extreme scale," says Seid Koric, technical program manager with NCSA's PSP and a University of Illinois associate professor of Mechanical Science and Engineering. The software company LSTC provided a large license pool for LS-DYNA for this benchmarking. This enabled the collaborative team to run the software across as many cores as possible. LS-DYNA was quickly scaled to 1,000 cores on Blue Waters; Koric continued to run larger and larger real-world problems provided by a PSP partner, pushing the code to 8,000 cores.

Progress was iterative, with repeated analysis of bottlenecks addressed by the software development team. For example, Koric worked with Cray and LSTC to distribute the problem across the system's memory efficiently. Performance was also boosted when the team switched from an MPI solver to a hybrid MPI/OpenMP solver with lower communication overhead and a smaller memory requirement.

With his expertise in both high-performance computing and mechanical engineering, Koric was able to examine the physics of the problem and suggest specific algorithms that might benefit from further parallelization. Cray profiled the code, confirming Koric's assessment, and LSTC removed those algorithmic bottlenecks.

"Cray was very helpful in understanding the system," Koric says. "They know all the tricks of the hardware, particularly when it comes to load balancing and profiling the code, while the same applies to LSTC and their LS-Dyna code."

Intrigued by the performance gains, a second PSP partner provided a complex engineering problem, which the team was able to scale to a world-record 15,000 cores on Blue Waters in January 2014.

Real-world solutions

While everyone enjoys setting records, the real pleasure comes from achieving real-world solutions such as the PSP team was able to provide to Rolls-Royce. Working together, Rolls-Royce and NCSA's PSP team accelerated the simulation and modeling performance for MSC Nastran and other data-intensive manufacturing codes where solver data size often exceeds the RAM memory capability. This speedup earned 2014 HPCwire Reader’s Choice and Editor’s Choice awards as the results enabled Rolls-Royce to transform the design chain and develop designs better, faster, and cheaper.

Alya award-winning record

PSP also earned an HPCwire Editor's Choice award in November 2014 for collaborating with the Barcelona Supercomputing Center (BSC) to scale BSC's Alya multi-physics code to a previously unprecedented 100,000 cores on the Blue Waters supercomputer. According to Koric, "it would take 17.4 years for a serial code to do what Alya on 100,000 cores of Blue Waters can do in less than two hours."

"These unprecedented results contradict the common belief that engineering simulation codes do not scale efficiently in large supercomputers, opening a new wide horizon of potential applications in the industrial realm," says Koric.

National Science Foundation

Blue Waters is supported by the National Science Foundation through awards ACI-0725070 and ACI-1238993.