Gimme structure

10.30.07 -

University of Utah researchers use parallel genetic algorithms to predict crystal structures for a variety of organic substances.

NCSA has helped Julio Facelli do his work for years, and that gives him an uncommon perspective on the impact that the center and others like it have had.

"I've been following [the National Science Foundation-supported centers] since the beginning," says Facelli, "and they've done a tremendous job of encouraging new approaches to science. In the beginning, users were the traditional suspects. But now more and more people realize that they need computational simulation to understand their experiments. In every field, the centers have had an impact on the way people do science."

"Simulation science is becoming part of the mainstream toolkit of the experimental scientist. The NSF centers drove that."

Facelli, director of the University of Utah's Center for High Performance Computing and a biomedical informatics professor, uses NCSA's Mercury cluster to predict crystal structures for organic molecules that are frequently used in pharmaceuticals, fertilizers, and explosives.

"Fifteen years ago, the drug companies wouldn't think about talking to a guy like me," he says. "But computational molecular science—numerical analysis—can now show them the promise of a good solution for their formulation problems."

Supercomputing is part of the reason for this shift, but so is the method that he and his team have developed for calculating crystal structures. This method was featured in a 2007 Journal of Chemical Theory and Computation publication.

Survival of the fittest

Crystal structure refers to the identical pattern of atoms that repeats over and over to form the macroscopic material. The details of this microscopic duplication have a big impact on the substance's macroscopic properties, influencing features like solubility, reactivity, and color.

Modeling the crystal structure of a given substance, Facelli and his team begin with nothing more than the atoms in the molecule and the nature of their bonds. They're looking for structures with the lowest energies, which typically mark the molecules' standard crystal structures or something very close. The problem is that this straightforward data and straightforward goal create billions of possible solutions. Just imagine finding the needle of the lowest energy in that haystack of possible structures.

To narrow the search to more likely candidate structures, the team uses a parallel genetic algorithm called MGAC that they have developed over the previous six or seven years drawing on systems at NCSA and on TeraGrid.

"An exhaustive search is not feasible, so we have to have a way to direct the search," says Facelli. "[Genetic algorithms] are based on the principle of survival of the fittest. Trial solutions compete with one another in the population for survival and produce offspring for the next generation of tests. These algorithms offer excellent scaling properties, which make them good for large-scale parallel computing systems like those at NCSA and emerging computational grids like TeraGrid."

For example, an initial calculation on NCSA's Mercury may run 20 simulations on 20 different processors simultaneously, calculating possible crystal structures for a given set of atoms and their bonds. The energies for these structures are compared. The 10 with the lowest energies are kept, and the features of those 10 are mixed and matched to generate the structures for another 10 possible structures. Energies are calculated again, comparisons are made, best candidates are kept, and the cycle continues.

The "mating operation," as the mixing and matching is called, stagnates quickly, producing very similar structures over the course of thousands of generations. To combat this lack of variety, the genetic algorithm also introduces arbitrary mutations into the process, occasionally taking one variable from one of the best candidates and including a random number for that variable in the next generation.

From billions to hundreds

Even with a genetic algorithm greatly reducing the number of candidate structures and the amount of time it takes to find them, the researchers are still frequently left with more than a million possibilities. To narrow it down further, they find and remove the many structures that are the same physically but have very slightly different profiles numerically, "essentially getting rid of the rounding errors," Facelli says.

Next, they eliminate candidates that are the same structurally but that have revealed themselves in different orientations. This post-processing, which is done automatically at Utah's Center for High Performance Computing, eventually gets the number of candidates down to a couple of hundred.

"It's important to understand we're not so much trying to predict the exact structure. There are dynamic factors influencing the crystal growth, which means that the experimental structure might sometimes be higher than the lowest energy," Facelli says. Experts can compare those structures remaining and get it down to 10 or 20 that they want to test in the lab. "We take it from the billions of possibilities and say, 'Here are the most probable.'"

For more information: http://www.chpc.utah.edu/~facelli/

This research is supported by the National Science Foundation and the National Institutes of Health.

Team members
Victor Bazterra
Julio Facelli
Marta Ferraro
Matthew Thorley