An amazing race

08.24.07 -

By Kathleen Ricker

Six amino acid sequences. Thousands of atoms. Millions of time steps. Could protein researchers using NCSA's Tungsten solve these protein structures...and test out a theory about how they folded...all in three months?

In their folded states, no two protein structures are alike. Typically composed of anywhere from 50 to 300 amino acids, proteins are responsible for nearly every cellular function in the body—transporting and breaking down nutrients and waste and providing the conduit for communication between cells that makes possible the existence of complex structure in higher organisms. Proteins accomplish all of this through folding, or reverting to a compact, native shape that allows them to bind and interact effectively and selectively with other molecules. In this native state, a protein's oily carbon atoms will cluster tightly in the molecule's center, surrounded by an outer layer of water and water-soluble atoms such as nitrogen and oxygen.

Each protein collapses into a unique configuration that is responsible for that protein's functional properties, which in turn affect the way it interacts with other proteins. "If you can predict what shape a protein will fold into," says Scott Shell, a chemical engineer at the University of California at Santa Barbara, "you can tell a lot about the function of that protein and what it will do." Shell emphasizes that this knowledge, a crucial component of the fields of genomics and proteomics, is also particularly important to the discovery of treatments for neurodegenerative diseases such as Alzheimer's, which is believed to result from the accumulation of misfolded proteins.

However, figuring out just how proteins fold is, according to Shell, "the most prominent grand challenge in theoretical biology right now." Shell and his former postdoctoral supervisor, Ken Dill, a professor of biophysics at the University of California at San Francisco, used NCSA's Tungsten to solve the structures of several proteins using an innovative, computationally intensive method for investigating the physics of the folding process.

An alternative route to the finish line

Last year, Dill, Shell, and their collaborators were participants in a protein research community-wide competition called CASP (Critical Assessment of Techniques for Protein Structure Prediction) which enables researchers to evaluate the performance of their techniques for protein structure prediction. Protein crystallographers and NMR spectroscopists contribute to the competition protein structures that have not yet been made public. These structures are kept secret, but the protein sequences are disseminated to participants, who attempt to computationally solve the positions of all the atoms in order to predict the structures, or target proteins. In 2006, 250 groups participated in CASP7.

One dependable approach for solving the protein structure involves bioinformatics. Researchers match the amino acid sequence for a given protein against known sequences in a database containing tens of thousands of protein structures, and draw inferences from those structures using biological sequence comparison tools like BLAST or FASTA that will help them figure out the target protein structures.

Almost all CASP participants use bioinformatics methods, which produce good results for many of the target proteins. However, there are a number of scenarios in which bioinformatics methods do not perform well. For instance, when drugs bind to proteins they may do so in a way that changes the proteinís structure. Or the protein may simply be unrelated to any of the proteins included in the database.

Instead, Dill's group chose a physics-based approach, which they believe offers a general and potentially powerful folding strategy. Physics-based approaches to solving protein folding problems are used far less frequently because the time steps involved in simulating the atomic energetics are so small and incremental—on the order of a femtosecond, or one billionth of a trillionth of a second. "You need to be faithful to the atomic dynamics from one instant to the next, otherwise you violate Newton's laws" says Dill. Simulating a single protein means precisely simulating these infinitesimal movements for tens of thousands of atoms, which culminates in an enormous number of femtosecond time steps in order to reach time scales relevant to protein folding. And that's not even considering the simulation of the behavior of the water surrounding the protein's atoms. Consequently, only a few groups other than Dill's use physics-based approaches to protein folding. They make use of vast amounts of computing power: on folding@home, a SETI-like arrangement that borrows unused computing cycles from 100,000 distributed processors; and on IBM's Blue Gene computer, for example.

Protein folding from the bottom up

The physics-based approach Dill's group used was based on a model called zipping and assembly. "It's very efficient, it's a good way of very quickly short-circuiting a lot of possible conformations the protein wouldn't search in the first place," says Dill. The process of zipping and assembly begins deep within a tiny portion of the amino acid chain that makes up the protein, in which a sequence of a few amino acids prefers to rearrange itself—or "zip"—into a particular structural arrangement, or conformation. This same phenomenon occurs simultaneously in other regions along the chain. The structures that form then interact with each other to create larger structures, in a process Dill calls assembly, until the entire protein has rearranged itself into its native, folded state.

Dill likens zipping and assembly to the process of human speech comprehension: Nouns and verbs are grouped together and perceived as meaningful phrases, which in turn can be linked together into larger clauses and, ultimately, into entire sentences. But searching for just the right configurations of amino acids requires rapidly evaluating all the potential ways in which folding could occur—no small task, even for the relatively small proteins Dill's group selected for their simulations, which consisted of anywhere from 72 to 112 amino acids each. "You have to find the exact conformation, the perfect alignment of all the atoms, and the precise energetics—and you're looking through a tremendous sea of possible conformations," says Shell, who led the computational effort. "It's a very difficult search problem."

Dill's group had only three months to complete the simulations. "CASP is kind of a noncanonical way of doing research," says Dill. "For each protein structure we tried to predict, we had only a few weeks before the deadline, so we needed to be able to get priority to queues on certain processors." NCSA biochemist Eric Jakobsson helped Dill and his group get the time they needed on Tungsten and put them in touch with Dave McWilliams in the NCSA Consulting Group, who helped them troubleshoot problems on short notice. They were able to meet the deadlines for all six target proteins.

"We had a month's deadline for each protein simulation—and that was basically the length of time we needed to run them," said Shell. "We were always down to the wire, so it was great to just submit things whenever we could and run jobs without having to worry about waiting in the queue for too long."

On time—and on target

Dill was pleased with the results of the zipping and assembly method. In the final outcome, the physics-based approach predicted the structures of four of the six proteins pretty well. The other two, Shell explains, turned out to be proteins that, because they require interactions with other proteins in order to fold, were not good candidates for zipping and assembly, something that was not known before the competition started and that no one could have predicted. However, Dill's group's solutions to the other four were reasonably accurate. "We felt that we did about as well or better than the average performance of all the other groups in CASP, who used what were essentially bioinformatics approaches," says Shell. "We were the only ones with a purely physics-based approach. So we think this proves that physics-based modeling provides potentially very powerful opportunities, and we think this idea of zipping and assembly is a great way to shortcut the huge computational requirements of physics-based methods to fold proteins in a reasonable amount of time."

More importantly, though, Dill and Shell argue that this method could do more than just predict folded protein structures; it may, in fact, explain how proteins actually fold. "There's something a protein knows, something about the energetics that makes it fold efficiently and fast," says Shell. "If you do follow a mechanism-based approach like zipping and assembly, and you get very close to the correct answer, this tells you that the mechanism is a very viable way a protein might fold."

For further information: http://www.dillgroup.ucsf.edu/

This research is supported by the National Institutes of Health and UCSF.

Team members
Kenneth Dill
S. Banu Ozkan
M. Scott Shell
Vince Voelz
Albert Wu