Can You Imagine: Supporting Data Infrastructures with Software?

02.27.19 -

Data, as well as the software used in the creation and usage of data, are now a major part of scientific research and education, to the point where many groups are pushing for them to be on par with paper publications with regards to dissemination as well as reproducibility. The preservation, sharing, and use of these digital products, however, is far from trivial, with many conceptual, technical, and social complexities that are now being addressed in fields such as computer science, information science, and the evolving cross-disciplinary field of data science.

Deputy Director of Scientific Software & Applications at NCSA, Kenton McHenry has worked with scientific communities for over 10 years across biology, geoscience, and engineering to develop a service that would mutually support a need at the intersection of all of these communities with regards to utilizing data, specifically data transformations. As a principal investigator of the NSF-funded Cyberinfrastructure for Sustained Scientific Innovation (CSSI) - Clowder, McHenry and his team explore tools around the notion of active curation in support scientific data sharing, management, and reuse.

Active curation addresses the need for curation around scientific data, such as annotating data with metadata describing it, in order to make it discoverable and usable by others. Specifically, active curation distributes the curation process throughout the lifecycle of the data, leveraging analysis/machine learning to automate a good portion of the process. The Clowder framework provides an open source Dropbox-like capability that allows data to be shared as easily as within Dropbox, but further supports the active curation and exploration of data so that data can be more easily published in community data archives at a later time. This effort further addresses the sustainability of scientific tools such as Clowder, exploring potential service models and brings together a very diverse community made up of academic, education, industrial, and international partners all requiring similar capabilities.

About NCSA

The National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign provides supercomputing and advanced digital resources for the nation's science enterprise. At NCSA, University of Illinois faculty, staff, students, and collaborators from around the globe use advanced digital resources to address research grand challenges for the benefit of science and society. NCSA has been advancing one third of the Fortune 50® for more than 30 years by bringing industry, researchers, and students together to solve grand challenges at rapid speed and scale.