A dynamite system
07.15.10 - Permalink
Irene Qualters joined the National Science Foundation in December 2009 as a program director in the office of Cyberinfrastructure, with responsibility for the Blue Waters project. She recently talked with Access' Barbara Jewett about Blue Waters, as well as the future of high-performance computing.
You have a really extensive background in private industryMerck, Ageia Technologies, SGI. How did you come to make the leap into government as a program director for OCI?
Having always worked in industry, I really had never seen myself in the government. But I was really intrigued with the leadership role that I thought only a few entities could exert in high-performance computing (HPC). I deliberately pursued NSF because I thought NSF had that potential to play such a role, and second because my views personally resonate with the open science approach of NSF. And as I researched the vision, called the Cyberinfrastructure Framework for 21st Century Science and Engineering, CF21, I became quite excited and I thought that perhaps my background would be different enough and complementary enough that it would be a really good fit at NSF. I made quite a deliberate choice.
So you obviously feel your private sector experience is an asset to you at OCI.
While NSF itself doesn't have vendor contracts, I've found that my background can help the Blue Waters PIs when they are interacting with vendors, as I understand from a business perspective what's important and what isn't. In industry, to make a relationship work it has to work for both parties. In some cases I may be more sensitive to what a for-profit entity views as attractive. For a vendor, that's not just about money.
Coming in at the middle of a project can sometimes be very challenging. What challenges, if any, have you encountered with Blue Waters?
I must say one very rarely gets a chance to start at the ground and see a project all the way through. The challenge with Blue Waters is that it's a very large project, it's moving, so there are lots of pieces. One thing that has been helpful for me and hopefully helpful for the team is that I had previously worked with Bill Kramer [NCSA's Blue Waters deputy project director]. I was a vendor, at SGI, and he was on the other side at NASA and later DOE. So we didn't need a getting acquainted period before getting into the work.
You are fairly new to NSF and to the Blue Waters project, but have you found anything that you are particularly enjoying about the project?
Oh, yes, absolutely. Blue Waters is not simply about standing up a big system over many years and making sure that it all goes swimmingly. I also have the PRAC (Petascale Computing Resource Allocations). So I get a chance to make sure that a portfolio of computational science that is going to be done from the beginning on this system is noteworthy, and, frankly, quite exciting. I've thoroughly enjoyed working with the PIs. And I will also say there are very good project directors and managers at NCSA as well as a very good panel associated with Blue Waters, so working with everyone has been very enjoyable. It's arguably the most exciting project going on in OCI, and I get to say that because I'm so enthusiastic about it! I love getting up in the morning and going to work.
As we work through the PRAC process, petascale computing applications in a wider variety of fields will most likely become available. How do we ensure that effort has the broadest impact possible?
That goes back to one of the key aspects of the OCI vision, which is really about making sure that scientific and engineering theory, with the experimental studies, have the third leg of the stool, which is the computational science. One of the people who has been most instrumental in making sure that Blue Waters from the beginning has the broadest impact possible is the project director, Thom Dunning. He's been making presentations to various groups, and we've also actively reviewed the portfolio that we currently have, determining what additional scientific disciplines we should approach to get them interested in thinking about Blue Waters.
There's a very strong team at NCSA that is dedicated to working with the applications teams. This does two things. One, it ensures that any team is not floundering on any of the specifics and we know what they need, and two, it also allows knowledge to be shared across teams.
We're also more informally collaborating with the users of the TeraGrid. For many teams, TG is used as a vehicle to develop applications but it is also one of the key places from which to gain entry into Blue Waters. We want to do more.
I would say there is quite a bit of evangelism going on and we are all, in our own way, being evangelists.
Let's go back to the vision. NSF and OCI have many initiatives underway that most likely will impact scientific computing. How will these initiatives influence the research community and the CI and OCI vision? And what do you feel that vision is becoming?
The vision first introduced by Dan Atkins, and taken further with CF21, developed and led by Ed Seidel, is continuing to evolve. We're fleshing out the strategy from that vision and evolving specific plans to implement it. I'll ramble a little bit and say it's not just about the technology resources, it's about the skills to effectively use those resources. And it's not just about the scaling up, but how do we make it easier for the scientists and engineers who need these computational tools to operate at various ranges, up and down. And, finally, it is about an operating model to sustain the capabilities and skills.
I've heard many people talk about the educational needs.
It is not insignificant, this educational component. Computational science by its nature is interdisciplinary, so that makes it more complex than a single domain. You can't just say, "Here's a curriculum of education in computational science." It needs to be integrated with physics, with math, with biologyit doesn't exist in isolation. There is a nontrivial challenge associated with the educational piece. How do you nurture and develop the skills you have, and how do you promulgate them so that the scientific and engineering disciplines can have the third leg of the stool as strong as theory and as strong as experimentation?
Data-driven discovery is also changing how science is doneprojects that will collect petabytes of data every day. What sort of role do NSF and OCI see themselves playing in data-driven discovery; is that part of CF21?
At OCI, the long-term access, preservation, and analysis of data is a key component of our vision.
The technical capabilities of Blue Waters include very large memory, data, and an extremely high I/0 bandwidth. It's important to note that Blue Waters will be a dynamite system for data-intensive problems. And that's one of the areas that we're doing some hunting, if you will, for good science and engineering problems to solve because this is really going to be a top-notch system on which to do those.
When you get into the high-data area, visualization becomes increasingly important. Later on this year we'll be reviewing and engaging more on the visualization side for Blue Waters, which is a very strong platform.
The Blue Waters project had barely begun when many in the scientific HPC community were already talking about exascale computers. What do people focusing on exascale computing need to be looking at?
First of all, I'll say we wouldn't be scientists, and maybe we wouldn't even be humans, if we didn't look beyond where we are. And I think the sustained petascale is going to keep us busy for quite some time.
Having said that, it is already clear that to get to exascale computing, incremental scaling is insufficient. Fundamental changes, for example, with failure containment and failure resilience will need to be addressed. For example, for the component rate that we'll be at with exascale, parts will be failing constantly. That is nothing new from petascale. However, undetected failures may well be much more significant.
So the software needs to be able to be sufficiently resilient that it can tolerate the failure. And that can go all the way into the algorithms themselves with undetected errors. The ability of algorithms to accommodate small, occasional errors for some problems will be extremely difficult. That's an example of an area that will need research and adoption of new approaches.
Another areacurrently used to accommodate failureis checkpoint restart. At the very largest systems, the very largest programs running, a system can take up to 30 percent of the time doing checkpointing rather than running applications. Scale that up to an exascale system, people don't even want to think of how much of the system could be consumed just writing off partial files in case the system goes down when you're computing. So that's an example of something that is very serious that would need to be tackled in order to make exascale viable.
You want to have problems that can scale up and down, and you need that continuum, so that's another reason why another really key area associated with exascale is the approach to parallelism. Scientific parallelism is largely MPI-based right now. Given where the multicores are going at the processor level, that parallel programming paradigm for many problems actually won't scale to exascale. So that's a serious issue that is being reconsidered.
I'm very excited about addressing the problems that are already identified for Blue Waters. And while I'm enthusiastic about exascale, I am very hopeful and confident that Blue Waters' impact at petascale is going to be extremely high. The project was designed for sustained petascale, and for me, the sustaining, the implied ease of use, the ability to solve a wide variety of problems, that's really what keeps me focused, keeps me excited, keeps me really hopeful for what it will accomplish.