released 04.15.09
The National Science Foundation (NSF) outlined several science and engineering problems that a sustained petascale computer should address when it issued the solicitation for Blue Waters. Through the PACTs, scientists and engineers who use these applications are working closely with computing experts to address the specific challenges outlined by NSF, as well as other computational problems at this large scale. Lessons learned in optimizing one application will be applied to others as appropriate.
MILC Code Lattice QCD
Lattice quantum chromodynamics (QCD) is a method for studying quarks and gluons, subatomic particles that comprise some of the basic building blocks of our universe. In lattice QCD theory, physicists envision space-time as a crystalline lattice where quarks can be found only on vertices and gluons can travel only along lines connecting quarks. By simulating the evolution of the lattice system as the spacing between vertices changes, physicists hope to understand the behavior and interaction of these tiny, mysterious particles.
The NSF solicitation described the following lattice QCD challenge:
A calculation in which 50 gauge configurations are generated on an 843x144 lattice with a lattice spacing of 0.06 fermi, the strange quark mass ms set to its physical value, and the light quark mass ml = 0.05* ms. The target wall-clock time for this calculation is 30 hours.
MILC frequently passes many small messages and also requires frequent collective communication, during which all of the processors are communicating simultaneously. This communication load requires a network with very low latency. The lattice QCD PACT is working to increase MILC's scalability by increasing the amount of overlap between communication and computation operations. The PACT is also improving the efficiency of the conjugate gradient algorithm that is part of MILC in order to reduce the number of required iterations and will be exploring alternatives to this algorithm in order to make the code operate more efficiently.
PACT members include Greg Bauer at NCSA and Steven Gottlieb at the University of Indiana (one of the MILC developers).
NAMD Molecular Dynamics
Today's high-performance computers enable scientists to simulate systems as large as several million atoms, but this still falls short of capturing the full complexity and extended timescale of important biological processes. The NSF has outlined a molecular dynamics problem including 100 million atoms, a level at which biophysicists will be able to simulate the interactions of viruses with cell membranes and embedded proteins, ribosomes and the mechanism of protein construction, and the protein-folding problems that contribute to many diseases. The NSF soliciation specified:
Simulation of curvature-inducing protein BAR domains binding to a charged phospholipid vesicle over 10 nanoseconds simulation time under periodic boundary conditions. The vesicle, 100 nm in diameter, should consist of a mixture of dioleoylphosphatidylcholine (DOPC) and dioleoylphosphatidylserine (DOPS) at a ratio of 2:1. The entire system should consist of 100,000 lipids and 1,000 BAR domains solvated in 30 million water molecules, with NaCl also included at a concentration of 0.15 M, for a total system size of 100 million atoms. All system components should be modeled using the CHARMM27 all-atom empirical force field. The target wall-clock time for completion of the problem using NAMD with the velocity Verlet time-stepping algorithm, Langevin dynamics temperature coupling, Nose-Hoover Langevin piston pressure control, the Particle Mesh Ewald algorithm with a tolerance of 1.0e-6 for calculation of electrostatics, a short-range (van der Waals) cut-off of 12 Angstroms, and a time step of 0.002 ps, with 64-bit floating-point (or similar) arithmetic, is 25 hours. The positions, velocities, and forces of all the atoms should be saved to disk every 500 time-steps.
NAMD, the widely used parallel molecular dynamics program developed by Klaus Schulten's team at Illinois, is being used to solve this molecular dynamics problem.
NAMD, written in C++ using the Charm++ parallel programming model, divides the computational domain into box-shaped patches, each containing varying numbers of atoms. Since the set of patches affected by pair-wise and bond interactions changes as the atoms move about, the dynamic load balancing capability of Charm++ is crucial to scalability, and improving load balancing is one focus of the NAMD PACT. The PACT is also working to overlap communication and computational work and reduce the code's memory usage.
PACT members include Sanjay Kale, Celso Mendes and Eric Bohm from the Charm++ group along with Schulten, John Stone, and Jim Phillips from the NAMD development group, all at the University of Illinois.
Pseudospectral method Turbulence
Turbulence problems are everywherefrom designing aircraft and automobiles to predicting the weather to understanding the lifecycle of a star.
The NSF-specified turbulence problem requires the use of the pseudospectral method within a periodic box. Simulations at a higher resolution than ever used before and in this simple setting can be used to validate turbulence approximations made in other more complex simulation codes. The specified problem is:
A simulation of fully developed homogeneous turbulence in a periodic domain with 12,288-cubed grid points for one eddy turnover time with turbulent Reynolds number about 2000. The problem should be solved using a dealiased pseudospectral algorithm, a fourth-order explicit Runge-Kutta time-stepping scheme, 64-bit floating-point (or similar) arithmetic, and a time-step of 0.0001 eddy turnaround times. Full-resolution snapshots of three-dimensional vorticity, velocity, and pressure fields should be saved to disk every 0.02 eddy turnaround times. The target wall-clock time for completion is 40 hours.
Various approaches and codes are being explored within the context of the specified problem, which requires over 30,000 one-dimensional Fast Fourier Transforms (FFTs) and transposes of multiple three-dimensional arrays for each of 10,000 time-steps. In addition to the need for high-performance transforms on a single core, high-performance data communication across the whole system is needed. The PACT also is looking at performance gains through overlapping FFTs and communication.
PACT members include Bob Fiedler, Bob Willhelmson, Mark Straka, Jeongnim Kim, and Jing Li at NCSA, and Adolfy Hoisie and Darren Kerbyson at Los Alamos National Laboratory.