Strong and connected

12.01.08 -

Ed Seidel, director of the National Science Foundation's Office of Cyberinfrastructure, has worked with NCSA for almost two decades. He's been a postdoc and research scientist at the center, a supercomputer user, and a collaborator on various projects. Seidel sat with Access' J. William Bell to discuss NSF's vision and the perspective his career has given him on cyberinfrastructure.

Q. Set out the current NSF cyberinfrastructure vision and what you're hoping to accomplish with the investments being made.

A. Science is not just about data or simulation or computing or networks or visualization, but about complex problem solving. Which is really where NSF is going, particularly in what we call transformative research, so things that really transform the way that scientists carry out new kinds of science. Any single problem that promises to be transformative often involves every element of cyberinfrastructure, so cyberinfrastructure needs to be very well integrated.

Q. And how is NSF organizing those efforts?

A. [We have] four primary areas of development that need to be continued and ramped up in many cases.

The first one is virtual organizations for distributed communities, and that's very important for developing problem solving environments. The second is high-performance computing. The third is data, visualization, and interaction. The fourth, but at least as important, is education and workforce development. And those four in a sense underpin every activity in cyberinfrastructure that's being carried out at the foundation and worldwide.

High-performance computing is probably the most prominent among them. That includes all of the Track 2, Track 1, TeraGrid and now the new XD solicitation, that's Extreme Digital. High-performance computing is doing very well in terms of supporting the national scientific computing community and research and development.

[Author's note: Track 1 refers to the NSF award won by the University of Illinois, NCSA, IBM, and the Great Lakes Consortium for Petascale Computation that will build Blue Waters, a sustained-petascale system for open scientific research. Track 2 refers to awards for a set of very powerful, but smaller, machines at several sites around the country. And Extreme Digital is the NSF award that will be a follow-on to the TeraGrid project.]

Q. Are those efforts going to look like the supercomputing centers programs of the last 20 years or is there some fundamental shift underway?

A. There are some elements of both, but there is a fundamental shift underway.

With the integration of the TeraGrid and beyond with XD, the generality of computing -- resources being available and it not mattering where the jobs are done -- is going to become much more fundamental. Even more than it is now. And that's going to create a shift to the national cyberinfrastructure that people use without worrying as much about the individual details.

But when it comes to the very specialized machines like Blue Waters, those will require a lot of teamwork to develop the applications to work properly at scale. So those will be singular points in the big national cyberinfrastructure. Those will be used, I hope, in different ways. They won't be time-shared with thousands of users. There will be a small number of users using those machines to solve only those breakthrough problems that couldn't possibly be attacked any other way.

Q. What do people focusing on sustained-petascale computing need to be looking at?

A. Going beyond developing applications for such machines, which is a huge challenge, there are other major issues to consider. Just this summer, there was an entire issue of Nature dedicated to the data challenges that we face. In June, the month I accepted the job to come onboard at NSF, the cover story of Wired magazine was "The End of Theory: The Data Deluge Makes the Scientific Method Obsolete," and I thought that was kind of ironic, to come to NSF when that was the headline story. It was a provocative title, but it was about the fact that there will be so much data out there and the modalities of doing science will change so much.

Q. And how would that be reflected in a resource like Blue Waters?

A. My background is in black hole simulation and computational astrophysics. In my own area, I'd very much like to see something like the gamma ray burst problem solved or at least explored on a machine like Blue Waters.

If you add up all the arithmetic you have to do on a single gamma ray burst calculation -- when you integrate all of the microphysics, the radiation transport, the nuclear physics, the hydrodynamics, the general relativity, the gravitational wave aspect, neutrinos -- you find that you will have to sustain petaflops per second in order to do the calculation in a period of days or even weeks. A single simulation! That can generate, say, a 5 petabyte file, and how do I actually analyze that? How do you manage all of that, categorize it, even label it and move it?

There are so many different challenges in that one problem. No one's an expert in all of those areas. And I didn't even scratch the surface of developing the software, the visualization capabilities, the libraries.

Q. And that's just one example.

A. Petascale immediately leads into the big data problem. These are the kind of problems that organize entire communities worldwide. In every discipline, there are dozens of problems of this caliber and class.

Q. Tell us a little about the PetaApps and PRAC programs that are going on currently and what role those are going to play in those sorts of problems.

A. If you invest hundreds of millions of dollars in hardware, you surely have to think about how to invest in developing the applications that will actually use that hardware. PetaApps is meant to get those communities together to have them look at how everything from the algorithms to the software to the basic physics will be scaled up for a machine like Blue Waters.

PRAC is another smaller-scale program that provides money for people to travel and develop their collaborations. So you might say that's money to develop virtual organizations.

Q. What should the centers and institutions building the Track 1 and Track 2 systems be doing to foster the kind of integration you've been describing?

A. Well, I think the centers are trying very hard, both to be the best they can be and to be part of the bigger picture of the national cyberinfrastructure. People understand they have to be both strong and connected.

I think they also have a responsibility to serve not only the science communities, but also develop computational science and cyberinfrastructure as a discipline. The centers have a critical mass of staff and faculty around their campuses that are very important in terms of training the next generation. They're doing that, and I think we have to find ways of doing more of that.

Q. What role do the centers play in training that next generation?

A. Individual students and postdocs must be trained in using the national cyberinfrastructure. When they start developing codes [for the simulations that drive their research], they frequently don't have the right training. When I was a student, I didn't. I was very naïve and was a bad Fortran programmer. And I thought, because of my training, that the theoretical physics was the hard part.

But the hard part is figuring out how to turn that into a problem that can be solved on a computer or with cyberinfrastructure. Providing a stronger sense of architecture for how everything fits together and then making sure that the campuses are integrated would go a long way toward training everyone as they're being brought up as graduate students and undergraduates.

Q. Let's talk about the significance that NSF cyberinfrastructure programs have for average citizens. What impact do these systems and related efforts have on my mom and dad?

A. There are many levels to that question, but I think the best example for people, though they may not know it, is Mosaic coming out of NCSA. That's a beautiful story of something that came out of NCSA in the early days that blossomed into the total revolution in the way we all access information. [Author's note: Mosaic was the first widely used graphical Web browser. It led to Netscape, Internet Explorer, and the broad use of the Web for communications and commerce.]

On a more scientific level, a way NSF's cyberinfrastructure programs affect the public would be hurricane projections. There are lots of studies that show that even if there is no destruction, hurricanes cost millions of dollars just for evacuations. If hurricanes were forecast more reliably, we would better know to evacuate some areas and not others.

In the case of Katrina, if we'd had more information in advance [through better, simulation-based hurricane forecasting], potentially we could have saved thousands of lives and millions and millions, even billions, of dollars. The insurance industry would be less hard hit, recovery would be faster, and so on.

Q. I was introduced to you years ago as a supercomputer user and an application scientist. What sort of insight does that history give you in your current role?

A. It's critical. Because I come from the applications side, I hope people will feel comfortable to give me the kind of input that is needed to support the growth of this area at the NSF. I hope to be accessible to people and carry their perspective.

The other thing is, particularly my experience at NCSA, taught me the fundamental importance of teamwork, of working with people from other disciplines and computer science. I want to promote that. NSF is the perfect cauldron for that because we represent all the disciplines and computer science.

National Science Foundation

Blue Waters is supported by the National Science Foundation through awards ACI-0725070 and ACI-1238993.

National Science Foundation

XSEDE is supported by National Science Foundation through award ACI-1053575.