UI researchers use supercomputer to develop techniques for more accurate evolutionary trees

07.12.17 -

by Susan Szuch

The University of Illinois at Urbana-Champaign has a bit of a history with genomics, to put it mildly. In 1977, microbiologist Carl Woese uprooted the tree of life, a concept dating back to the early 19th century that explored how organisms were related and evolved. Woese is credited with discovering the third domain that organisms could fall into—Archaea—consisting of single-celled organisms that are vastly different from bacteria, plants or animals. This discovery changed how researchers viewed biology, as Archaea was practically a new life form.

The tradition continues with the Carl R. Woese Institute for Genomic Biology (IGB) on the Illinois campus. Tandy Warnow, a National Center for Supercomputing Applications (NCSA) faculty affiliate, a professor of computer science and bioengineering and a researcher at IGB, is improving the way research is conducted with the help of the Blue Waters supercomputer at Illinois' NCSA. She and her team developed computer science techniques that give the most accurate evolutionary trees currently possible.

Evolutionary trees are diagrams that show how organisms may have evolved based on similarities and differences, much like how a family tree shows who your ancestors were. Typically, these trees are built by looking at a specific gene in a collection of specific species. However, if you look at a different gene, the evolutionary tree is different. While you would assume all the genes have the same evolutionary tree, that's not necessarily true.

"How do you get to the question of how a collection of species has evolved, given that the genes have different trees?" Warnow says. "That's something we have worked on and have a really good method to evaluate, and what we've been doing with Blue Waters is designing and studying methods to understand and figure out the species tree from a collection of gene trees."

This is not her first encounter with supercomputing and evolutionary trees.

"We design innovative computer science techniques to (get high accuracy on large data sets), and then we apply them to biological data and simulated data to see how well they work, and then we fine-tune the algorithms so we get the best accuracy we can," Warnow says. "We're going after really big data sets, aiming for high accuracy on really big data sets."

In 2014, she was part of a consortium that analyzed how orders of birds are related through evolutionary trees. Had the analysis been performed on a single CPU, it would have taken 250 years.

But birds are not the only subjects of this research—the research methods are even applicable to humans, specifically in our guts. Bacteria in our digestive tracts have the potential to make us sick or keep us healthy and understanding which specific bacteria are present and how they got there is aided by these advances in creating evolutionary trees with the help of Blue Waters.

Work that Warnow and her team does will help biologists all over the world create more accurate evolutionary trees. With more accurate evolutionary trees, researchers will be able to answer other biological questions, such as how a species adapts to an environment.

Although much has been learned, Warnow says that "there are just so many unanswered questions." Through her continued use of Blue Waters, Warnow hopes that she can answer some of them.

National Science Foundation

Blue Waters is supported by the National Science Foundation through awards ACI-0725070 and ACI-1238993.