NCSA team gets $10.5 million grant to develop software for un-curated data | National Center for Supercomputing Applications at the University of Illinois

NCSA team gets $10.5 million grant to develop software for un-curated data

11.14.13 -

A team at the National Center for Supercomputing Applications (NCSA) has been awarded more than $10 million over the next five years from the National Science Foundation to develop software to manage and make sense of vast amounts of digital scientific data.

"The information age has made it trivial for anyone to create and share vast amounts of digital data, including unstructured collections of images, video, and audio as well as documents and spreadsheets,” says project leader Kenton McHenry, who leads NCSA's Image and Spatial Data Analysis division along with project co-principal investigator Jong Lee. "But the ability to search and use the contents of digital data has become exponentially more difficult."

That's because digital data become trapped in outdated, difficult-to-read file formats and because metadata—the critical data about the data, such as when and how and by whom it was produced—is nonexistent.

The NCSA team, partnering with faculty at the University of Illinois at Urbana-Champaign, Boston University, and the University of North Carolina at Chapel Hill, will develop two services to make the contents of un-curated data collections accessible. The Data Access Proxy (DAP) will chain together open/save operations within software applications in order to seamlessly transform unreadable files into readable ones. The Data Tilling Service (DTS) will serve as a framework for content analysis tools in order to automatically assign metadata to files within un-curated collections.

McHenry likens these two services to the Domain Name Service (DNS), which makes the Internet easily navigable by translating domain names, like CNN.com, into the numerical IP addresses needed to locate computer devices and services and the information they provide.

"The two services we're developing are like a DNS for data, translating inaccessible un-curated data into information," he says.

Rather than starting from scratch and constructing a single piece of software, the NCSA team is building on their previous software development work and aims to use every possible source of automatable help already in existence. By patching together these various components, they plan to build a "super mutt" of software, which they call "Brown Dog."

The initial targets for the software are projects in geoscience, biology, engineering, and social science, but McHenry says the software could also be broadly useful to help manage individuals' ever-growing collections of photos, videos, and unstructured/un-curated data on the web.

Project collaborators include Barbara Minsker, professor and Arthur and Virginia Nauman Faculty Scholar and Faculty Affiliate at NCSA, and Praveen Kumar, Lovell Professor, both in the Department of Civil and Environmental Engineering at the University of Illinois at Urbana-Champaign; Michael Dietze, assistant professor in the Department of Earth and Environment at Boston University; and Richard Marciano, director of the Sustainable Archives & Leveraging Technologies lab at the University of North Carolina at Chapel Hill. The team also includes William Sullivan, Arthur Schmidt, and Jerome McDonough at the University of Illinois at Urbana-Champaign; Jay Alameda at NCSA; Luigi Marini and Rob Kooper as senior development staff and software architects within ISDA, and Dave Mattson as project manager.

For more information about the Image Spatial and Data Analysis group at NCSA, see http://isda.ncsa.illinois.edu/.