(Cloud + super) computing = results

10.29.14 -

by Nicole Gaynor

Blue Waters and cloud resources are helping Vijay Pande's research group at Stanford analyze data to tackle serious diseases at the molecular level.

Can cloud computing replace supercomputers like Blue Waters in the future? No, says Vijay Pande, director of the biophysics program at Stanford University. He says both are critical to his study of serious diseases like Alzheimer's and cancer.

Pande's lab uses cloud computing through Folding@home and Google Exacycle to run many detailed molecular dynamics (MD) simulations of protein folding independent of one another.

"A lot of what we do is run the raw trajectories on Folding@home, or Google Exacycle, analyze it on Blue Waters, and spit it back out to Folding@home," says Pande. "If we just had the [cloud computing] Folding@home part without Blue Waters, we would generate a lot of data but we would have a hard time analyzing it," says Pande.

Folding@home is distributed computing software that Pande's lab developed and released in 2000. With Folding@home, people volunteer unused computing power on their home computers to help crunch numbers. Computers around the world run independent MD simulations and return their results to Folding@home. Google Exacycle works in a similar manner except that Google's computing infrastructure supplies all the computing power and researchers apply for time. Both of these are examples of cloud computing (also called distributed computing), which is great for raw computing power as long as input/output (I/O) and communication requirements are low.

On the other hand, Blue Waters supplies high-speed networking and storage and tight connections between nodes—all of the characteristics missing from cloud computing. Pande says this makes Blue Waters a completely different kind of resource and also a more flexible system. The sheer power available in each core on Blue Waters means the supercomputer can solve problems that are well-suited to cloud computing and also those that require more I/O and communication than cloud computing can provide.

The combination of Blue Waters and cloud resources using Pande's methodology allows his experiments to reach time scales 1,000 times longer than similar experiments.

"It's like the difference between traveling to the store versus going to the moon," says Pande.

Trading continuity for efficiency

Molecules are minuscule and vibrate rapidly, making them difficult to study experimentally. Even when using computer models that calculate changes every femtosecond (one quadrillionth of a second), challenges remain. Researchers must first create an accurate model of molecular changes, then run it long enough and with short enough time steps (the time interval at which the model solves its internal equations) to simulate realistic processes, and finally analyze the masses of data that result from the model runs.

"It isn't very interesting change until it accumulates for millions of steps," says Robert Brunner, a research programmer at NCSA who helped the Pande team make the best use of the computational and storage resources of Blue Waters to interpret their results.

Pande estimated that at 10 nanoseconds per day of in-model time, it would take 1 million days—3,000 years—to complete a single simulation of protein folding on a single processor.

Proteins don't start with a single well-defined structure. Rather, they go through a process called folding that determines their three-dimensional structure. Proteins can only function properly if they fold into the correct structure. Most folding errors are benign—but not all.

Many serious diseases, like Alzheimer's, some cancers, Parkinson's, and Mad Cow disease, result from errors in protein folding, or the way protein molecules in the body are constructed. Pande's group aims to discern which errors lead to disease, how the errors happen, and what kinds of medicine may prevent the folding errors or mitigate their effect.

"It's like one of these sci-fi movies where you're trying to fight a shape shifter," says Pande.

Complementary approaches

Experimental tools can show either structure or dynamics of proteins at high resolution, but not both at the same time. As in an increasingly diverse array of fields, computer simulations can provide the detail that observations lack.

Traditionally, large-scale MD simulations involve running long simulations on the high-powered, networked processors of a supercomputer. Klaus Schulten, director of the theoretical and computational biophysics group at the University of Illinois, pioneered the use of graphical processing units (GPUs) to speed up these simulations on supercomputers with his award-winning code called NAMD (Nanoscale Molecular Dynamics). The slowest part of such a large-scale model run is transferring information between the cores.

Pande traded the continuity of a large-scale model run for even more efficient parallelization that avoids the bottleneck that results from communicating information between cores in the large-scale runs. On top of that, a suite of shorter, independent simulations can run on heterogeneous hardware, like cloud computing, and handles hardware failures better. If a single simulation dies, the rest continue. Using the same amount of computing time, Pande's code completes many short model runs that effectively examine the state of a molecule at various times throughout a similar long run, providing a complementary approach to a traditional MD simulation.

Results from the first generation of Pande's Folding@home MD runs pass to Blue Waters. A tool called MSM-Builder clusters first generation results into microstates, or groups of molecules that are similar in structure. It then identifies which molecules have reached a long-lived, or metastable, state. Some of these microstates become the starting molecules for the second round of model runs. The subgroup of molecules passes back to Folding@home for a second generation of runs. This process may iterate several times during a single experiment.

“Cellular intercoms”

Pande’s group applied this technique to study G-protein-coupled receptors, or GPCRs, using Google Exacycle. Xavier Deupi, a scientist at the Paul Scherrer Institute, calls GPCRs “cellular intercoms” that coordinate activity of cells and tissues.

The group simulated 2.15 ms of a GPCR called β2-adrenergic receptor, or β2AR, which binds preferentially with epinephrine (more commonly known as adrenaline). β2ARs are more widespread than β1ARs or β3ARs, affecting the lungs, gastrointestinal tract, liver, uterus, vascular smooth muscle, and skeletal muscle.

A Nature Chemistry paper published earlier this year about the Pande group's research stated that more than one-third of marketable drugs target GPCRs. β2AR antagonists, a subset of beta blockers, reduce the effect of adrenaline. β2AR agonists mimic β2AR and cause the opposite effect of beta blockers, relaxing smooth muscle. For example, albuterol relaxes muscles around the bronchial tubes and makes breathing easier for asthmatic people. β2AR is also implicated in type-2 diabetes and obesity.

Exactly how the GPCR intercom works and the effect drugs have on it is the subject of this and similar studies. Markov state models and Monte Carlo sampling allowed researchers to stitch together many short model runs to create a few 150 μs trajectories. These mini-movies help the researchers see how β2ARs work and how β2AR-related drugs work.

Studies of other biological systems may benefit from a similar cloud-based method, according to the Nature paper.

Exa-MD of the future

Supercomputing and cloud computing power will both increase in the future, says Brunner. Both are a boon for work like Pande's. Greater cloud computing power will allow Pande's individual runs to complete more quickly, while increased supercomputing power will guide efficient use of cloud resources and increase the ability to analyze the sea of data the cloud produces.

Pande does not see his work competing with those who do traditional NAMD runs. In order to scale to the exascale computers of the future he envisions a suite of long runs—in other words, a combination of his and more traditional supercomputing methods.

"Let's say [a code] scales to a thousand cores but the machine has a million cores. We could run a thousand simulations each of a thousand cores," says Pande. "We could run a hundred thousand such systems and that's how you scale."

National Science Foundation

Blue Waters is supported by the National Science Foundation through awards ACI-0725070 and ACI-1238993.