04.28.11 - Permalink
by Thom Dunning
Last fall our Chinese supercomputing colleagues caused a stir in the United States, as well as the rest of the world, by announcing the Tianhe-1A supercomputer.
Tianhe-1A, with a Linpack benchmark score of 2.57 petaflop/s (PF/s), captured the #1 spot on the Top500 List, shoving aside Oak Ridge's Jaguar at 1.76 PF/s. Tianhe-1A was even referred to in President Obama's State of the Union address on January 25: "just recently, China became the home to ... the world's fastest computer." The rapid rise of China in the Top500 rankings (another Chinese supercomputer, Nebulae, is #3 in the Top500 list) is indicative of the significant investments of China in high-performance computinginvestments that encompass education and research in a broad range of HPC technologies. There is no doubt that China is catching up fast, but one should not relegate the United States to a second-class spot in supercomputing so quickly.
Tianhe-1A achieves a peak performance of 4.70 PF/s by deploying 7,168 NVIDIA Fermi GPUs in addition to 14,336 Intel Nehalem CPUs. Nearly 80 percent of Tianhe-1A's performance comes from the GPUs. The problem isat the present timethere are few science and engineering applications that can take full advantage of the massive numerical performance of GPUs. So the performance of Tianhe-1A on nearly all existing science and engineering applications will only be a fraction of that achieved on the Linpack benchmark. As a result, Jaguar, despite its #2 ranking on the Top500 List, will likely outperform Tianhe-1A by a substantial margin on most current science and engineering applications. The extraordinary performance provided by GPUs is certainly of interest in our country. In fact, NCSA will shortly be deploying its second NSF-allocated CPU-GPU system to enable further exploration and use of this technology, which is only one part of our HPC ecosystem.
The Chinese achievement highlights a flaw in the metric used to rank supercomputers on the Top500 List. We have long known that the Linpack benchmark was not an ideal metric. It measures just one element of the performance of the computerits ability to solve a dense system of linear equations. This metric is important in some, but certainly not all, science and engineering applications. Despite attempts by Jack Dongarra, the developer of the Linpack benchmark, and his colleagues to convince the high-performance computing community to use a more comprehensive set of metrics (e.g., the HPC Challenge benchmarks), the community has clung to the Linpack benchmark. This benchmark is, after all, simple and, unlike the HPC Challenge benchmarks, gives a single number, making it straightforward to identify the "most powerful computer." But what does it mean when a computer runs the Linpack benchmark well, but many science and engineering applications poorly?
What is the means of identifying the "most powerful computer" in the world? Unfortunately, it is not simple and the fact that the HPC Challenge benchmarks are underutilized today is an illustration of this difficulty. In addition to high-performance processors, science and engineering applications require a low-latency, high-bandwidth memory subsystem (many scientific applications are memory-bound, not compute-bound) and a low-latency, high-bandwidth processor interconnect (critical for scaling applications to large processor counts). In addition, many science and engineering applications involve substantial movement of data between memory and the file system. So the performance of the I/O subsystem is also important. The HPC Challenge benchmarks cover most, but not all, of these (and other) critical performance characteristics. But how does one convert a large number of benchmarks, like the HPC Challenge benchmarks, into a composite score to enable the computers to be ranked?
Perhaps the best way to rank the performance of a computer system would be to have it execute an average workload found at a number of supercomputing centers and select the top performing computer using this metric. This is what is done at Lawrence Berkeley's NERSC with their ESP (Effective System Performance) metric, which is regularly used in their computer procurement process. Although this approach works well when comparing computers with similar architectures, it is not easy to apply if there are significant differences in the architectures. In this case, substantial software porting and optimization efforts, similar to those under way at NCSA in support of the Blue Waters petascale computer and at Oak Ridge National Laboratory in support of its upcoming Titan petascale computer (a hybrid CPU-GPU computer), would have to be undertaken to ensure that applications, designed for earlier computers, are taking advantage of the special features of the computer. Clearly, this approach could entail an enormous amount of work.
I will go out on a limb and state that Blue Waters, the IBM PERCS system being deployed by NCSA this year for the National Science Foundation, will be "the most powerful computer in the world" when it officially comes online next year. Will it be the #1 computer on the Top500 List? It may or may not be; the Japanese Kei supercomputer as well as Lawrence Livermore's Sequoia and Oak Ridge's Titan supercomputers and any new computers deployed by China may score higher on the Linpack benchmark. But, with its high sustained (≥1 PF/s) and peak performance (≥10 PF/s) for a broad range of science and engineering applications using powerful eight-core chips (~250 GF/s), high-performance memory subsystem (~5 PB/s) and interconnect (~500 TB/s bisection bandwidth), coupled with its world-class I/O subsystem (≥4 TB/s), Blue Waters will solve a wider range of science problems in less time than any other computer that will be available in the 2012 time frame. Isn't this what we mean when we crown a computer "the most powerful computer in the world?"
So how can we unambiguously identify the #1 computer in the world? I don't have an answer, but we certainly can't continue to use the one-dimensional Linpack benchmark. The days when this was a useful metric are over. In fact, perhaps we need to ask if a simple type of ranking such as this is relevant today given the wide variety of computational science and engineering problems that are being tackled, each with its own special demands on the computer system. We all want to be ranked at the top spot, but our mission is to operate systems that truly support science and engineering discovery. What is really important is the time to solution for problems that are of importance to you.