Making sense of the complex

10.09.18 -

by Barbara Jewett

When they learned of the publication of a collaborator's research showing that long-term use of synthetic estrogen alters the microbiome in the guts of mice, the Visual Intelligence for Biology (VI-Bio) group took a brief moment to congratulate themselves on another successful attainment of their mission statement, then dove back into their work.

Making sense of complex biological data is the daily routine for the VI-Bio group at the National Center for Supercomputing Applications (NCSA). What's not routine is their outcomes: the VI-Bio group applies machine learning approaches to analyze the data and transforms the abstract biological results into visual form.

"We take complex biological data and help researchers make sense of it," says team leader Colleen Bushell. Identifying relevant features in these data and presenting the complexity in a visual form improves comprehension, she explains, which in turn leads to new discoveries and insights. The group builds visual analytic tools for studying genomic and related data.

Zeynep Madak-Erdogan's research showing that long-term use of synthetic estrogen alters the microbial composition in the guts of mice is a recent example. The University of Illinois professor of food science and nutrition, who is also an NCSA faculty affiliate, leads the Women's Health, Hormones and Nutrition lab at the U of I. Her team studied how synthetic estrogen is metabolized in the intestinal tract, and learned that changing the chemistry in the gut by taking probiotics might be a way to change the half-life and properties of the estrogen. This means that post-menopausal women could get the benefits of hormone therapy without increasing their risk of developing reproductive cancers. This work was published in Scientific Reports.

VI-Bio data scientists Loretta Auvil and Michael Welge worked closely with her team to understand their research questions and form multiple computational experiments to identify and execute the most promising analytical approach for their specific problem. They are currently working with Madak-Erdogan to analyze data for early prediction of toxicity of small molecules that could lead to cancer in mammals. Using computer modeling and machine learning methods that predict toxicity early and accurately will enable agricultural companies and regulatory institutions to perform more focused and targeted studies, either for toxicity assessment or for refining chemicals, such as those used in farming, and is less expensive and provides information more quickly than doing laboratory studies.

The VI-Bio group, which also includes Matt Berry, Peter Groves, Lisa Gatzke, and Xiaoxia Liao, has multiple collaborators, including the Mayo Clinic and Northwestern University. They build professional quality software and apply their design methodology to create interactive data visualizations and complex user interfaces that allow researchers to execute machine learning analysis and study their results. These projects help scientists, doctors, and patients understand biological mechanisms and genetic variations. One example is the group's OmiX tool for studying the microbiome and its relation to human health.

The information design process begins by learning and understanding the series of questions the researcher is asking of the data. This knowledge is then used to develop an organizational structure and a visual hierarchy that reflects the hierarchy of the researcher’s thought process. The result, says Bushell, is a different approach than if the content is organized based on the technical processes, or by attempting to communicate all of the complexity and details at once.

Several of Vi-Bios projects have led to new insights in human diseases including cancer, chronic pelvic pain, latent tuberculosis and heart disease.