Enabling Long-Term Reuse of Experimental and Computational Datasets on Protein Dynamics

Diwakar Shukla
College: Liberal Arts and Sciences
Award year: 2019-2020
NCSA collaborators: Luigi Marini

Modern molecular simulations of proteins on high-performance computing resources such as Blue Waters generate extensive atomistic-detailed information about protein dynamics, which could be leveraged for obtaining insights about molecular origin of human diseases, design of therapeutics, bioengineering of plants. However, the key challenge is to convert the terabytes of biomolecular dynamics data generated on supercomputers into a format accessible to an experimental researcher. In this proposal, we present an approach that not only generates suggestions for optimal experiments based on simulation data (e.g. for validation of simulations) but also integrates the existing experimental and simulation information to generate comprehensive models of protein dynamics that are missing from the current literature. We have developed algorithms that provide an approach that maximizes information gain for the design of experiments given simulation data. We propose to work with NCSA collaborators to implement a cloud-based platform and a user interface for this proposed service. NCSA will benefit from working on this project by gaining more expertise in applying cyberinfrastructure in the realm of biomolecular dynamics. The biggest impact of the proposed study is that it provides an accessible tool for experimental researchers to help harness the knowledge hidden in the big protein simulation datasets generated using Blue Waters and other high performance computing resources. This work will have a transformative impact on how protein science is conducted by experimental and computational research groups.