Serving the world of science

05.22.12 -

Alan Blatecky, head of the National Science Foundation's Office of Cyberinfrastructure, recently chatted with Access' Barbara Jewett about the new visions at OCI and NSF.

The Advanced Computing Infrastructure plan has five strategies. Is there one that you feel is more important or more urgent to address than the others?

We carefully selected those five. It is important to understand what we're looking at an ecosystem, if you will. One of the problems we've had in the past is people too focused on one thing or the other. If you look at what we're doing in CIF 21 [Cyberinfrastructure Framework for 21st Century Science, Engineering and Education], you really can't effectively focus on one strategy without doing things in other areas. For example, the first ACI strategy talks about foundational research in parallelism and concurrency and so forth. That is absolutely critical, but the point is you then need to connect that to the science being done, which is what the second strategy is about. The third strategy says that while you are working on the first two strategies, you also have to have infrastructure to work on, and if you have that then you need expertise and people. So which one is more important depends on which hat you're wearing at the moment.

One of the criticisms that many in the HPC community often level at NSF is that NSF funds hardware, but not software development. CIF 21 has software included in its framework. And applications development is ACI strategy number two. Is this signaling a shift in NSF's thinking?

Yes, you are absolutely right on target. Software and data are becoming ever more important in computational science. Several years ago, data was often considered a by-product of science. Today data and software are absolutely essential to every science; in fact, it seems pretty clear that advances in computational capabilities and innovation require significant research efforts in software and algorithm development.

As I read through the ACI strategies, it struck me that they seemed to be describing multi-site, multi-resource collaborations like the current XSEDE project. Am I correct in that? You're bringing everybody to the table?

Right. This is because some issues and problems can only be solved by involving multiple disciplines. Earth Cube is a good example. This is a major new initiative and partnership OCI has with the Directorate for Geosciences. We brought together a wide range of communities to better understand the earth from its center to the sun. This meant that we had to engage a broad range of geoscientists and infrastructure developers. To kick this off, we conducted a four-day charette with these communities. At the end of the charette, 60 percent of those who participated said they needed data outside their own area and discipline to do the work they needed to do. [Editor's note: A charette (pronounced shuh-ret) is a method—usually a series of meetings—to capture the vision, values, ideas, and talents of all interested parties with the goal of creating and supporting a feasible plan.] When we start looking at ways to address grand challenge problems such as this, it is evident that they require an enormous amount of capabilities and a wide range of expertise.

Where do the NSF-funded centers fit into this plan?

If the centers are not already looking seriously at the ACI strategies, they ought to do so now. The NSF centers need to effectively participate in all these areas to address the needs of science. It's not just having a computational resource, it's about supporting the entire computational ecosytem. The challenge is how do you leverage these to support science and education.

Tell me more about strategy four, the education and workforce programs. How do you envision this being implemented?

That, I'll be honest, is one of the toughest things we have to do. Because ACI is an integrated ecosystem, it requires people with expertise in many different areas. For example, we have already talked some about the importance of computational science and data-enabled science. We need that sort of expertise in order to do science, and that is much more difficult than saying "Oh, let me give you a course in how to write a new algorithm." It's much more complex.

So the issue is how to do we build a new education and workforce pipeline for computational and data-intensive science. We need to have it extend across the entire spectrum; from the faculty and the researcher, to students and post-docs, to the technician. How do we begin to expand and develop that much larger and complex education and workforce capability? This has to also include addressing how do I understand the problem and construct it in a way I can solve it.

Many have said GPUs are the way of the future for scientific computing, and that NSF probably will only fund GPU machines in the future. Care to comment?

I think that's too limiting. I think GPUs indeed are going to have a tremendous capability in the future but there's lots of other things coming up as well. For example, there's a lot of press about clouds, and clouds will indeed probably play a significant role, but we need to understand what can they do better than what exists today.

What we're trying to do with ACI is also look at the distributed nature of the compute and data side. For example, the increase in distributed capabilities coming from sensors, smart phones, tablets and laptops are already dramatically changing the computational world and will continue to do so in the future. With ACI, we need to look at the whole range of capabilities emerging, not just the very high end, nor just with GPUs. What do we need to do to best address the science, and most effectively increase productivity and national competitiveness?

So how are we going to fund all of this? Will it still be on the 5-year funding cycles?

Well some hardware is going to still be funded on a five-year cycle, but what's going to happen—and you're already seeing it on the data and software side—is more focus on multiyear programs and on sustainability. It is clear that while hardware may be obsolete in three or four years, software stays around for 10-20 years and data is forever. So we are looking at programs to address the much longer timeframes of these activities, what it means, and how to get there.

It is also important to note that ACI is not just an OCI activity; it involves all of NSF. For example, CISE (Directorate for Computer & Information Science & Engineering) will be announcing significant foundational research efforts in strategy number one; MPS (Directorate for Mathematics, Physical Sciences) will work to develop new mathematical algorithms. These will be multiyear programs.

We need to start looking at longer-term programs to be able to start focusing on those sorts of activities. A strict model of fixed three-to five-year programs is not adequate; the issues are much more nuanced and complex.

NSF starting down a new path?

Exactly. We put down [in ACI] what we think are the issues. This includes a focus on enabling and facilitating foundational research in computational and data-enabled science and engineering, and broadening the use and applications to all disciplines.

Anything else you want people to know?

If it's not clear, we're trying to talk about ways to significantly broaden and extend computational capabilities not only to all the disciplines, but also make it available to everyone. How do we provide computational access and expertise to the entire science community? What should NSF be doing for those communities that need enabling software, algorithms, and data rather than just large computational resources? In essence, how do we democratize computational and data-enabled science so that it permeates science and education?

And doesn't that benefit all of us?

Yes. And that is what it is about. Plus, I'd point out, it's also a national competitiveness issue. Being able to effectively use computational tools, methods, and resources across the entire scientific spectrum is one of the most important things we should be doing to move forward.