Information Visualization in Comparative and Functional Genomics
College: Biotechnology Center
Award year: 2000-2001
The fast advancement of genomic technology is changing the way of conducting research in molecular biology. Various genome projects have produced a rich ground of data for comparative analysis of different genomes. Microarray technology has opened a complete new phase of functional genomics. As a result, biologists have gained access to complex databases whose orders of magnitude are much larger than before. Gigabytes of data must be turned into actionable knowledge, which transforms complex information into insight. Without the right tools, new discovery is being hampered by information overload. A new realm of data exploration tools called "Information Visualization" has provided new powerful tools to help us understand and solve complex, data-intensive problems.
The objectives of this proposal focus on integration of experimental data from microarray with other information such as metabolic pathways and developing new tools to visualize the complex and multidimensional expression profiling data in the context of genome, cellular processes, and developmental stages. We will explore the possible solutions and techniques of Information Visualization and develop prototype programs using Data Warehouse, Data Drilling, and Visual Data Mining techniques.
Several issues need to be addressed in dealing with the data from microarray analysis. The first issue is the analysis of expression pattern using clustering techniques. The second issue is the development of a database to organize and manage the experimental data, metabolic pathways information, and genome map and sequences data. The third issue is to develop novel tools for visualization of the integrated information.
A few ideas will be implemented in the project. One is to overlay expression profiling data on top of metabolic pathway map to give direct visual effect on where and how much a particular gene is changing and which pathway it affects. In preliminary study, we have integrated experimental data of gene expression from yeast with yeast metabolic pathway information on a 2D-map representation. The other is to present expression profile data, pathway information and genome information at same time on a 3D panel. People would be able to see the overall effects of a treatment first and then zoom into the detail to find individual gene. For time series experiments, the time factor can be incorporated using animation. User will be able to pause animation at any given point of the time course. Time series animation and 3D panel will allow user to look at the expression data of all genes at a particular time or look at the expression pattern of a few genes through the time course.
Because of the relatively small genome size, we will start implementing our prototype programs using data from yeast. Based on the preliminary yeast database, we will first build a prototype of 3D visualization then scale up to handle much big data set of soybean and cattle. The ultimate goal is to build a workable system can integrate and visualize microarray data from any species.