Drawing numbers

07.09.13 -

Access’ Barbara Jewett visits with Dave Semeraro, leader of NCSA’s Advanced Digital Services visualization team, to learn how his team enables research.

How would you describe visualization to someone who is not familiar with it?

You need to understand that a computer just generates numbers. That’s all the computer knows how to do. Visualization turns the numbers the computer generates into something visual we can more easily relate to. Humans are primarily visual interpreters of information. Most of our cognition comes from our visual perception of the world. When you are simulating something physical on the computer, like the climate or the formation of a new star in a galaxy, it’s meaningful to look at that from a visual standpoint rather than a table of numbers. So if I can take a representation of the globe and color the surface of the globe based on temperature, you can see over time how the average temperature in different regions of the globe changes, based on, for example, added greenhouse gases. You could graph those numbers, but it’s more engaging and insightful to look at them in a visual sense.

Are there different types of visualization?

The type of visualization that we do at NCSA is what is called scientific visualization. There’s another discipline of visualization called information visualization. Those are the sorts of visualizations that you might see on the Internet or in a magazine or newspaper. A good example would be an infographic. A classic infographic is the pie chart, such as one that shows where your income tax dollars go—which slice of the pie goes to defense, which slice of the pie goes to education, and so on. Here at NCSA, researchers we assist frequently have observational data. For example, water levels of rivers over time or how much precipitation certain areas received. Your local Doppler radar is a very important time-related visualization that’s from observed data. We have researchers who are combining Doppler radar data and simulation to better predict severe storms. That’s a combination of observed data and simulated data analysis and visualization.

Is visualization difficult for scientists to learn how to do?

No! It’s really not. In fact, one of the things that people don’t realize is that a scientist, during the course of developing his science and simulation applications, usually develops an entire suite of analysis and visualization tools and techniques that go along with them. One key aspect of that is the analysis part. Visualization without analysis is semi-useless. You could misrepesent the data. You could visualize the cloud surface of a severe storm without actually finding out what’s going on inside the cloud. Visualization experts know how to represent that information visually.

If scientists know how to do their own visualizations, why do we have visualization experts?

Well, science teams don’t typically employ people with a background in information presentation or visualization representation. They hire grad students in their discipline who are experts in simulating the physics the researcher wants to simulate but usually not knowledgeable in the physics of visualization representation. It’s a big enough job to keep up with the physics and mathematics of your field without having to worry about the extra burden of different visualization techniques. That’s why visualization consultants exist. You can’t be up on all the different visualization packages and still be a good scientist unless you are really, really good or a very artistic scientist.

Many visualization experts also have a background in art. On our NCSA visualization team, two of our team members have master of fine arts (MFA) degrees. Some people might not think that’s important, but it really is. Artists know how to present data in a meaningful way, how to draw the eye to the point the researcher wants to make with a particular visualization. Also, there are different techniques that come up that visualization experts know how to employ. There’s the whole world of volume visualization. And there’s different mathematics for doing photo-realistic rendering—to make clouds look like real clouds, and fire look like real fire. To make fluids and materials interact with light in a physically correct way. Now, whether that brings any more insight to an analysis can be debated. Do I learn more about a cloud or a thunderstorm by having it rendered photo-realistically? If I stand outside and look at a cloud or a thunderstorm coming my way, I really can’t tell physically what’s going on. So you can go too far in that realm. But at the same time, some of those techniques can be useful for scientific visual analysis.

The numbers researchers get from Blue Waters are larger than we’ve had before. What were some of the challenges in trying to develop visualization for Blue Waters?

The problem all visualization experts are facing today is the raw data size. That’s the main challenge. In the past, computing centers provided equipment to do computation and equipment that would be slightly smaller to be used for doing visualization and data analysis. Well, the size of the systems have grown to such an extent that it is not practical to provide a separate visualization supercomputer for several reasons. One is the cost involved. Typically in the past, the visualization machine would be some significant fraction of the computation machine, maybe 30 percent. That’s many millions of dollars today. In the past, when we were spending less, the number was smaller and we could afford it. The other aspect is power. Power is becoming expensive, and large machines use more power so it costs a lot to run them. Visualization is what we call a “bursty” use of a supercomputer, especially interactive visualization. Researchers will create a visual representation and look at it, scratch their chin for a little while, and maybe do a rotation, change a few parameters. There’s a lot of wasted time there, where the machine is on and it’s not doing anything. When you are spending millions on a supercomputer, it’s tough to justify that. Another reason is data. Data size is becoming a large problem because some of the analysis approaches and some of the software techniques that were used for visualization in the past don’t scale very well to the larger data sizes or sets. We were sort of cavalier in our software design and didn’t use our data structures in an efficient fashion, so there’s an effort going on now at several institutions to re-examine those classic visualization algorithms and write them in a much more data efficient way so that you can do things like change a structure or change some visualization aspect and it won’t copy a whole bunch of data. It’s hard to explain without going into great detail. There is a paradigm shift going on that’s moving away from some of the older software engineering techniques and moving into something that is more efficient from a data structures standpoint.

What do you see coming in the future of visualization?

Algorithms. The thing that is changing computing in general is also changing visualization. This move to many-core and multi-core processors has changed some of the ways the software we’re now creating for visualization is evolving. It’s more of a hybrid hierarchical memory usage model that we’re moving toward, MPI plus Open MP for the many-core parts. There’s becoming sort of a—not a blur, but the GPU technology is bleeding over into the CPU in terms of there being many, many threads of execution. It’s hard to say whether or not specialized graphics hardware will be discernible in the future. Where does the graphics chip end and where does the CPU chip begin? We’ve been seeing for a while now people doing general computing on GPUs, and with the increased capability and extra threads that are being supported on the extra parallelism of the new many-core platforms, it might be the case that this distinction might go away. And that’s a big jump on my part; there will be many people who disagree with me on that. There are many people, especially in the business world, who think that the separate graphics hardware isn’t going anywhere, that it will always be a distinct part of visualization landscape. How else is visualization changing? We’re going to have to figure out the large data issue. In the past it was practical to move data around from one system to another, but at petascale and exascale it is not very practical to move a large amount of data from one geographic location to another, or even across the machine room floor—it can be expensive or take inordinate amounts of time. I/O costs are high. But that’s nothing that’s new, that’s been a part of visualization for forever. I/O costs are always a significant piece of what you’re doing.

Anything else?

The core visualization support efforts in the Blue Waters and XSEDE projects have recently been combined into a single group as part of the NCSA Advanced Digital Services directorate. This is significant and makes sense, because all the researchers who are computing on Blue Waters are also computing through XSEDE on other NSF resources. Now that all the NCSA visualization support experts collaborate together there’s just better support at the center for scientific visualization on NSF resources, whether through XSEDE or Blue Waters. Some researchers are doing runs on Blue Waters and are lucky enough to have their data be small enough that they can move the data to other XSEDE resources or even their own home local systems for analysis. Since we are familiar with all that goes along with that, we can help with data movement and data analysis on XSEDE resources of Blue Waters data. And of course, we’re always happy to assist researchers with doing their analysis on the Blue Waters system itself. We support a lot of large-scale visualization applications on Blue Waters. Our focus is on helping people get their science done. And whatever we can do that helps them get to that point where they’ve made a significant science contribution or discovery, that’s what we’re here for.

National Science Foundation

Blue Waters is supported by the National Science Foundation through awards ACI-0725070 and ACI-1238993.