Predictive Data Mining of Complex Nutrigenomic Datasets
College: Agricultural, Consumer and Environmental Sciences
Award year: 2005-2006
The reductionist approach to biological research has been successful in identifying mechanisms by which individual components (genes, proteins, metabolites) function. However, it has become clear that an integrative approach is required to fully understand the behavior of the complex biological systems of the body. Given the recent advances in biotechnology and computer science, scientists now have the tools available to use a "systems biology" approach based on the simultaneous measurement of functional genomic, proteomic, and metabolomic parameters to study and understand biological systems. Using the systems biology concept, phenotypic data, genotypic data (genome sequence information), molecular biological tools, and bioinformatics may be used in concert to tackle complex nutrition "problems." Not surprisingly, as the complexity of the research equation builds, so does the difficulty of interpretation. Thus, the field of bioinformatics, which uses computers to manage and interpret biological information, has rapidly evolved and has been a key part of this research model. Automated learning methods, as those developed in the Automated Learning Group at NCSA, may not only aid in data management, but also enable researchers to perform predictive data mining. Automated learning platforms, able to effectively generate predictive biological outcomes from large and diverse datasets, will play a vital role in the future of biomedical research.
Using the dog as a nutritional model, our laboratory recently completed a large canine nutritional genomics experiment studying the effects of age and diet on metabolic characteristics and gene expression profiles. The datasets produced, containing vast amounts of phenotypic and genomic data from numerous tissue types, will be used for predictive data mining in this project. The objectives of this project are provided below:
- Mine the dataset to identify significant relationships present between genomic and phenotypic variables.
- Based on gene expression data, identify molecular "signatures" specific to age, diet, and tissue type.
- Using the information and "learned" relationships from Objectives #1 and #2, build a model with the ability to:
- Predict age, diet and tissue type when provided with metabolic and genomic data of an "unknown" sample.
- Predict specific metabolic outcomes (health and nutritional status) of an animal based on gene expression data alone.
Identifying unique molecular "signatures" specific to age, diet, and tissue type will not only be useful in the translational piece of our project (how does age and diet impact the various biological systems of the body?), but will be used as "predictors" for future research projects. This project is envisioned to lay the groundwork for future functional genomics experiments focused on studying the role of nutrition in the aging process and development of complex diseases.