Data and Information Science and Technologies

NCSA is at the forefront of data-intensive research.

NCSA and its collaborators provide a data management framework for the Dark Energy Survey, which is using the world's largest digital camera to undertake the largest galaxy survey ever attempted. This framework processes, calibrates, and archives the massive amounts of data—quadrillions of bytes over the lifetime of the survey—that will be collected for the DES.

NCSA is embarking on a $10 million effort (referred to as "Brown Dog") to develop software to manage and make sense of vast amounts of digital data. While technology has made it easy for everyone to create and share vast amounts of digital data—including images, video, and audio—searching, sorting, accessing and understanding that data is very challenging. Among the major issues: lack of metadata (the data about the data that describe when and how and by whom is was produced) and difficulty of access data in outdated formats. The NCSA team will develop services to make the contents of uncurated data collections accessible.

The Large Synoptic Survey Telescope will use an 8.4-meter telescope and 3-gigapixel camera to produce a wide-field astronomical survey of the universe that tracks its changes over time; in addition to probing the mysteries of dark energy and dark matter, LSST is designed to detect exploding supernovae, potentially hazardous near-Earth asteroids, and distant Kuiper Belt Objects. LSST will collect tens of terabytes of data every night, which will be processed, calibrated, and archived by NCSA.

NCSA and the University of Illinois also are at the forefront of efforts to develop a National Data Service, which would provide infrastructure to enable researchers across all disciplines to easily find, reuse, and publish data. More than 70 representatives from organizations across the United States and around the world gathered in Boulder, Colorado, in June to discuss this NDS and organize a consortium to turn their vision into reality. The NDS consortium is an international federation of data providers, data aggregators, community-specific federations, publishers, and cyberinfrastructure providers.

A new Materials Data Facility is being established as a pilot program under NDS. This data facility will provide a repository where scientists can preserve and share materials research data, produced by both simulations and experiments.Through the facility’s cloud-hosted data publication and discovery services, materials research projects will have an essential platform for rapid data sharing, discovery, and analysis that will accelerate the process for bringing new materials into industrial use. Key components of the facility will include multi-petabyte storage environments at NCSA and at Argonne National Laboratory, as well as the Globus research data management service operated by the University of Chicago.

On the Urbana-Champaign campus, NCSA is a partner in the Research Data Service (RDS), which was recently launched to provide the Illinois research community with the expertise, tools, and infrastructure necessary to manage and steward research data. The RDS assists researchers in development and implementation data management plans, not only for the purpose of complying with the growing requirements for federally funded projects but also for continuity and efficiency within research groups. Headquartered at the University Library, RDS is a partnership between several campus units; NCSA will contribute to developing storage infrastructure and solutions for making Illinois research data publicly available.

NCSA offers Integrated Data and Database Services (IDDS): expertise to partners in science and industry resource allocation services. IDDS creates unique applications to provide and manage resource allocations on HPC, HTC, HPSS, and other systems. They also specialize in database administration, both large scale (petabytes) and small scale installations, providing server specification and development consulting, database design and development, interface development, and performance optimization.

NCSA is also the host institution for the Midwest Big Data Innovation Hub, an NSF-funded organization that serves as a coordinating entity for data science activities in the 12-state Midwest region. The Hub convenes a diverse network of cross-sector partners from academia, industry, government, and nonprofits to build regional capacity in data science, through workshops, webinars, hands-on training, and other activities. The Hub is part of the National Science Foundation's national network of regional Big Data Innovation Hubs.