Leaving the dark days

11.12.09 -

As part of the Petascale Computing Resource Allocations program, a longtime NCSA collaborator and his team will make the improvements needed to run a popular cosmology simulation code on Blue Waters.

Brian O'Shea started as an assistant professor at Michigan State University just last year, but he's already been at this a long time. For more than a decade, he's been using NCSA supercomputing resources to simulate how galaxies form in the early universe. He began as undergrad at the University of Illinois working at NCSA with Mike Norman, who is now interim director of the San Diego Supercomputer Center.

"Those were dark days," he says with a chuckle. "We could run a simulation on 128 processors, and I believe I shepherded a single simulation through the machine for an entire year. There were maybe 1,000 galaxies in the entire simulation."

The NCSA relationship continues as O'Shea is one of the first researchers to win a National Science Foundation Petascale Computing Resource Allocations award. These research teams will work closely with the Blue Waters project team at NCSA and the University of Illinois in preparing their codes to run on the sustained-petascale supercomputer.

With Blue Waters, O'Shea expects to see more than 1,000 times the mass resolution and 32 times the spatial resolution when compared to those simulations from the late 1990s. In other words, he expects to probe the universe to much smaller physical scales.

That will translate into simulations of hundreds of thousands of galaxies instead of a thousand galaxies.

Simulations on Blue Waters should allow the team to get a much better look at the first billion years after the Big Bang, a time in the development of the universe that is only murkily understood. "A whole lot of stuff went on in that first billion years... A lot of galaxy formation took place. The universe was dense. Everything was close together. The rate at which things happened was really, really fast," he says.

Keeping up with the Webb

Star gazing is also gazing back in time. Objects are so distant that the light from them takes millions or billions of years to reach us. We're observing things as they appeared way back then. What's happening today won't be visible for eons.

In 2013, NASA will launch the James Webb Space Telescope. Its primary function will be to look back on these most distant—and thus oldest from our perspective—galaxies. Other telescopes with similar capabilities will follow suit.

O'Shea, Norman, and their collaborators will use Blue Waters and other petascale systems to make predictions about what the Webb telescope is going to see and how galaxies formed in the first billion years after the Big Bang. They'll also explore what's known as reionization, the process by which galaxy formation caused cold, neutral gas in the early universe to heat up and eject electrons and significantly change its properties.

"We've been more worried about matching observations at later times. The universe started out with tiny galaxies, and they merged together to make bigger and bigger galaxies over time. So nowadays, we mostly worry about really big galaxies like the Milky Way," O'Shea says.

But in the coming era of instruments like the Webb telescope and Blue Waters, the focus will change. "What we're interested in at times early on is maybe one ten-thousandth the size of the Milky Way. And lots and lots of them."

"These are questions that we never would have dreamed of being able to attack on a computer 10 years ago. It's just such a comically large amount of computing time... With Blue Waters, we can attack these huge problems that we never would have been able to before."

Diving into Blue Waters

O'Shea and his colleagues do their work on large supercomputers, day in and day out. Still, running on Blue Waters will require several significant improvements to the simulation code they use.

Called Enzo, the code was conceived in the 1990s by Greg Bryan, who was part of Norman's team at NCSA. The code now has more than a dozen developers across the country working on it and has users at more than a dozen universities.

O'Shea's work with the Blue Waters team will focus on issues of scaling—unsurprising, given that the code was first developed to run on systems with hundreds of processors and Blue Waters will have hundreds of thousands. Enzo is an adaptive mesh refinement code. It automatically zeros in on the more interesting parts of the model and simulates those sections in greater resolution. This approach makes the overall simulation less expensive computationally at the cost of greater simulation complexity.

"A given point in space is covered by overlapping boxes, grids of cells at different resolutions. You have to keep all of that synced up. The grids need to know if there are finer grids below them, coarser grids above them and so on and so forth," O'Shea says. "That is a bookkeeping nightmare."

Currently, each processor that the simulation is running on keeps a copy of the entire grid hierarchy, tracking where all the cells are and how they are all related to one another. On older simulations, with fewer grids and lower resolution, this wasn't an issue. Each processor might be keeping a few megabytes of such data. On today's simulations, that number has jumped to two gigabytes of data—a thousand times more.

"There's a tremendous amount of redundant information and bookkeeping to do, and that's what gets in the way of scaling to a gigantic extent," O'Shea says.

With this in mind, the team will work on streamlining the way the bookkeeping is done. Instead of each processor keeping all of the grid hierarchy, the code will be rewritten such that each processor will only know the details of the immediate neighbors of the grids it is working on. They also plan to introduce a chaining mesh into the code, which allows the processors to call for additional information from more distant grids if it is needed.

The team will also shift Enzo over to a hybrid parallelism model, using MPI to communicate between nodes on supercomputers and OpenMP within a node. Nodes on supercomputers refer to collections of processors that, in this case, share memory. Nodes are then connected to one another by ultrafast networks. By combining different means of swapping information among processors depending upon how fast the connection between the processors is, researchers using the code will get the best of both worlds.

"Those changes are going to give us huge improvements in performance," O'Shea says. "It'll help performance on any system, but it's what will actually allow us to do computation on Blue Waters at all."

This project was funded by the National Science Foundation.

Team members
Brian O'Shea, Michigan State University
Michael Norman, San Diego Supercomputer Center
Robert Harkness, San Diego Supercomputer Center
James Bordner, University of California, San Diego
Matthew Turk, University of California, San Diego

National Science Foundation

Blue Waters is supported by the National Science Foundation through awards ACI-0725070 and ACI-1238993.