Five in focus, Fall 2006
Story posted December 12, 2006
Petascale computing is now a realizable goal. But the path that will expose the full potential of petascale computers, allowing these machines to produce meaningful science and engineering results, is fraught with challenges. NCSA is working with partners around the country to overcome them. Here are five examples:
1. System software. Operating a computer of this scale requires compilers that translate friendly languages like Fortran into computer-ready assembly languages, checkpointing software that recovers a simulation run should the system crash during a calculation, data management software, and so forth.
2. Applications. These codes -- which run simulations, analysis, and other important calculations -- won't just automatically run on a system at this new level of scale. The codes will have to be re-imagined, and users will need intense support.
3. Systems. What hardware -- and how much -- is needed for a petaflops system is a crucial question. There's a need to understand what's imminent in the hardware area and how it will work.
4. Storage. A new scale of computing involves a new scale of archival storage. (see below)
5. Cyberenvironments. NCSA's partnerships with research communities are the bridge that will bring together the new hardware, the new applications, and the researchers, allowing them to draw upon this extraordinary resource.
4. The right stuff
A petascale storage environment is full of interesting avenues for research, says Michelle Butler, technical program manager for NCSA's storage enabling technology group. For example, storage is the key to a balanced system so that the system can run without input/output (I/O) trouble and thus function efficiently during even the most taxing calculations. One area being explored is how much storage will be required for the number of petaflops so as to achieve balance. And the performance of the file system for the designated storage functions also needs to be examined, she notes.
Other areas that NCSA is exploring are how much storage will be required by applications running at petaflop speeds, and how to efficiently serve data from the archive environment. With a huge number of disk drives, she says, it is important to look at how the environment functions while keeping data integrity at the highest level; that is, not loosing users' data owing to normal types of failures.
But all the research questions come down to this basic thought, says Butler: How do you keep a petascale machine's I/O wait at a minimal level? (I/O wait is when the CPU is waiting for some kind of storage system to reply about a request -- read/write/fetch.) "The I/O wait needs to be kept at a minimum," she explains, "because a machine this powerful shouldn't be sitting idle waiting for data to show up." The bottom line: it is crucial to dedicate enough resources to facilitate the easy flow of information into and out of the machine. The trick is knowing how much is enough. But the combined efforts of NCSA and its partners will answer that question.