Expanded knowledge

05.28.15 -

To learn more about the future of open research computing, Access' Barbara Jewett spoke with Blue Waters project leaders Bill Kramer and Cristina Beldica.

What is happening in scientific computing?

Bill Kramer: You cannot do any type of science and engineering without computing and data analysis these days. Even straight theory is no longer being done without proving, through a computer model, that your theory has substance and conforms to some amount of observation. The other thing we are seeing is a much tighter cycle of the interaction between computing/simulation and data/observation. They support each other in a tighter and tighter cycle and in many more disciplines now, even the science that has traditionally been just observational. There is no longer one set of people doing computing and another set of people observing data, it is all an interactive, integrated cycle.

Cristina Beldica: One thing that modeling enables is designing smarter experiments. You need fewer experiments, you can target specific areas, you don’t need to waste resources building physical models—it becomes much more productive and much more efficient.

Scientists talk about time spent moving data—days, weeks, months.

BK: There are two costs to moving data: energy and people. It takes more energy to move a piece of data than to do a compute operation on a piece of data. And that gets worse as you move the data further and further away. A bigger cost is how much effort people expend moving data and how long they have to wait, which means they're not doing the most productive thing that they can. We tried to solve that to the extent we can on Blue Waters by putting all the services in a tightly integrated space, a tightly integrated system, to minimize data movement for users so they can be more productive. Blue Waters users like the fact that they do not have to move their data much, they can do their simulations, modeling, post processing visualization all right there.

For instance, we were able to reduce the time it takes for Tom Jordan and his team to get their results. Their code was already very efficient for the big earthquake simulation part, but it was taking them months to move the data so they could analyze it and run the post-processing, which required running tens of millions of jobs. Plus they had to distribute their data across a number of smaller systems. By using Blue Waters both for their simulations and post-processing, and not having to move the data, they were able to reduce their time to solution from multiple months to just two weeks. So that is an example of how Blue Waters was able to craft a solution to a science problem that arrived much sooner that it would have in the past.

What about the push to exascale? I don't hear much about that lately.

CB: Our mantra has always been, and continues to be, increasing researchers' productivity, which means sustained performance. It is all about what science gains are enabled by using a balanced system, not some number of FLOPS or bytes that the supercomputer achieves on a given benchmark.

BK: The exascale program as it was originally envisioned five years ago is not at all the way it is now presented. The schedule, the scope, and the cost were originally all overly constrained, so it was not feasible. But there are plans in multiple countries as well as the U.S. to still reach a level of computing that some label exascale or exaflop. The target is not 2018 anymore. Even China is backing off 2018 now and they were the last ones maintaining that year. The U.S. has 2024 as the planned target for a program that is not yet funded, Japan is trying for 2021-22 for a program that is only partially funded.

Would it matter if the U.S. does not move forward?

BK: Yes. We absolutely need computational and data analysis resources. The rate of increase to go from where we are today to 100x to 1,000x more than Blue Waters can do today is certainly justifiable in the science goals of many domains and what can be achieved for using that amount of resources.

I think it is fair to say there are many groups, particularly in the open academic community, that would be disenfranchised if NSF did not support increased capability in computing at the high-end spectrum. DOE is funding what they need for their programs, their mission. They are not provisioning their budget to support all the science needs that have to be at this high spectrum to make advances and to accelerate the ability to compete both scientifically and industrially. There has to be corresponding investments across other organizations of the government to enable all the important projects to continue, to expand and do more discovery.

At NCSA we position ourselves as providing systems and services that enable teams with the most challenging problems to realize sustained performance, which means doing their work in less time and doing it better. As Cristina said, we believe in balanced machines as the best way to get productivity for the broad range of science users. It is not just how many extreme FLOPS are put into a system but also how much memory, storage, and I/O the system is designed to achieve. By the way, did you know we have applications that can't do their work without using 85 percent of Blue Waters' memory, which is much more than anybody can possibly use on other systems? These users can’t do their science without the memory, the same with the interconnect and the same for the storage. It's not just the number of FLOPS. We build systems that can be used for real science results, so we look to maximize the return on the investment across a broad range of disciplines.

We have new teams with Blue Waters allocations.

BK: And those go to some trends of how people would use things in the future. One of the things that I think we are showing now is once a team makes a frontier science calculation, or breakthrough, or grand challenge calculation, at significantly larger impact than they were before, others quickly follow with new problems at that level. The breakthrough may be the number of atoms they do or the resolution they use or how much material they were able to simulate, or how long they run. Once a team does that you might count that as "best of breed" calculation. And once there is the best of breed calculation, that calculation leaps ahead of everybody else. We've seen that with supercomputing for now going on 40 years, the first best of breed calculation is always substantially challenging. But once completed, it sets the standard and then the community takes other problems that need that level of computing and they become "early community adopters" and solve other problems at the new scale and complexity. So Klaus Schulten may solve the HIV capsid but there are other problems that need similar amounts of computing and the sophisticated software to solve. Then maybe a few years later, solving the same scale problems becomes the "community practice," where many teams and individuals are working at previous best of breed level. Then the same team or a different team does the next best of breed calculation, which would be the next stretch for the machine that is five or six times the sustained performance of Blue Waters. By having these leadership-class machines you create not only the best of breed but you create this following large-scale community that wants to work at that level, too. Think back to the first person that did an atmospheric simulation, which they did on the fast supercomputer at the time. By today's standards it was crude with very little performance, nothing like what we have today. But that was the first time somebody could envision using computers and a model to calculate atmospheric clouds. And now we have extended weather forecasts and many other weather and climate activities, all because that best of breed step was taken on a supercomputer of its day.

NCSA's first supercomputer, impressive for the time, wasn't even as fast as today's laptop.

BK: And the software was much less efficient. So it is not just the computer's speed, it's the improvements in software that have become unbeatable with the algorithms of the day. You have to develop new algorithms not just because the hardware changes but because the problem you are trying to solve is harder, bigger, more challenging, and you need to develop new ways of dealing with it. So software improvements are at least equal to the hardware improvements to meet these needs. If you look at what was done for much of the software that runs on Blue Waters, there were years of investments in making that software better to solve petascale problems. Teams are motivated to do that if they are confident that there will continue to be investments in large-scale resources.

Other things for the future?

BK: Besides the need for more computing resources, I foresee tighter integration of data and computing on the same system to minimize data transfer and data movement. Researchers have more complex workflows (the steps you have to go through to get your result not just to compute something), and more dynamic workflows. We are going to see more teams doing work at increasingly large scale as their field moves up from best of breed to community standard practices.

CB: We talked in the beginning how computing increases productivity of scientists and in industry, how we design experiments and make better use of the data. But an equally important aspect of computing is that it expands the realm of knowledge, it creates new knowledge by itself and in combination with experiment and theory. That is very, very important. We have seen already that people were able to achieve results that would not have been possible without Blue Waters. And these science and engineering groups and their whole communities are being energized by these new results, they have gained momentum and they are committed to maintaining this momentum, to make investment in their code base, in adding features, in making their code more efficient—it's targeting the next generation growth. This continuous growth is going to fuel the next discoveries and our understanding of the world.

Editor's Note: Cristina Beldica recently left NCSA to pursue other opportunities.

National Science Foundation

Blue Waters is supported by the National Science Foundation through awards ACI-0725070 and ACI-1238993.