Past Awardee

Development of Machine Learning-Based Analytics and Visualization Approaches for Predictive Toxicology

Zeynep Madak-Erdogan
Zeynep Madak-Erdogan

College: Agricultural, Consumer and Environmental Sciences
Award year: 2018-2019

This project addresses two key societal problems chosen by our Campus Strategic Task Force, Food Security and Cancer. The main objective of identifying early biomarkers of liver cancer due to liver toxicant exposure is a critical problem in agriculture industry and health.

A key aspect of novel molecule development (such as pesticides) for the agricultural industry is the assessment of acute to chronic mammalian toxicity and risk to human health. To be registered with the FDA for use, companies need to have extensive evidence for the safety of such chemicals for human health even after long-term environmental exposure or consumption of contaminated farming products. Utilizing only classical approaches for toxicity assessment can result in use of ~6,000 animals per molecule for testing toxicity. Studies in animals dramatically increase cost and time to market. In many cases toxicity to human health is only detected during late stage of product development, resulting in major set-backs to agricultural R&D pipelines. Thus, in silico methods that predict toxicity early and accurately enable agricultural companies and regulatory institutions to perform more focused and targeted studies, either for toxicity assessment or for refining chemicals that are used in farming. Computational analysis of such data is a much-desired capability to mitigate the negative impact on health, to minimize cost, to save animals, and to reduce time to market.

The goal of the proposed project is to develop a machine learning (ML) approach for early prediction of mammalian toxicity of small molecules using gene expression data. Studies at 28 days, 90 days and 2 year rat/mouse are routinely conducted in agricultural chemical companies. Gene expression data, histopathology images (e.g. from liver) and endpoint information (e.g. necrosis, cell proliferation) are collected at different time points. The aim of this proposal is to create a novel platform that will use cutting-edge ML methods to predict later stage (e.g day 28) endpoint toxicity from early stage (e.g. 9h or 24h exposure) gene expression data. The diversity of disciplines (Toxicology, Bioinformatics, and Machine Learning) required for the success of this project necessitates interdisciplinary expertise across NCSA and FSHN. The ability to integrate large dynamic data sets to develop meaningful predictive information is a major industry challenge. This project will use cutting-edge ML techniques to integrate multiple data streams to develop predictive information (toxicity) that is highly sought after by the agrochemical industry.