J. William Bell
 A Flower's Family Tree 1 2 3 4
 200 years in two hours


But mostly, the team lets biologists who use their data look for commonality, consensus, and the like. Bader, Moret, Warnow, and their students are in it for the computational challenge. And the Campanulaceae phylogeny reconstruction certainly gives them that.

At the beginning of the project, they wanted nothing more than to improve BPAnalysis, a code used for breakpoint phylogeny research. BPAnalysis would have required over 200 years to generate, label, and score the nearly 14 billion trees represented in the Campanulaceae problem.

"We started with the goal of reimplementing BPAnalysis from the ground up. It was simply much too slow. We wanted to gain efficiency and speed by improving the algorithm and by parallelizing the code. Just focusing on one of those wouldn't have given us the kinds of improvements we've seen," says Moret.

Albuquerque High Performance Computing Center's LosLobos supercomputing cluster.

BPAnalysis relabels every internal node each time it refines a tree; GRAPPA recalculates labels for only those nodes that could possibly show a change. BPAnalysis looks at identical strings of genes over and over again, even those matching other gene fragments that have already been analyzed; GRAPPA identifies common subsequences and condenses them, leaving fewer genes to be considered. BPAnalysis runs on only one processor; GRAPPA scales linearly to hundreds of processors running in parallel. GRAPPA leaves a scant memory footprint of only 1.6 megabytes and can work almost entirely in a computer's cache memory thanks to a working set of less than 0.5 megabytes. GRAPPA is also modular, allowing different methods of calculating the nodes' labels to be swapped in and out easily.

When recently tested on the Campanulaceae problem on Albuquerque's LosLobos computing cluster, GRAPPA showed incredible results. The massive reconstruction was completed in one hour and 40 minutes on the machine's 512 733-MHz Pentium III processors—a 1,000,000-fold speedup over BPAnalysis.

Now that's the sort of thing a computational scientist can get excited about—an example of cluster computing in full bloom.


This research is supported by the Albuquerque High Performance
Computing Center, the Department of Energy's Sandia National
Laboratories, the National Science Foundation, and the David and
Lucile Packard Foundation.

 

 1 2 3 4 up