New NCSA team to focus on data challenges

11.08.12 -

A new team at the National Center for Supercomputing Applications will tackle the challenges of extracting meaningful insights and innovation from science and engineering data in a wide variety of disciplines. The Data Cyberinfrastructure Directorate, led by NCSA Deputy Director and Chief Technology Officer Rob Pennington, unites NCSA projects, personnel, and capabilities focused on data-driven science.

"Science and engineering are being revolutionized by the increasingly large amounts and diverse types of data flowing from new technologies, such as digital cameras in astronomy, highly automated sequencers in biology, and the detailed simulations enabled by the new generation of petascale computers, including NCSA's Blue Waters," says NCSA Director Thom Dunning. "The knowledge gained from data-driven discovery is already transforming our understanding of natural and societal phenomena and the future holds even greater promise."

"Information is essential to the performance of science," Pennington says. "This Data Cyberinfrastructure team will be a bridge between data and the research and educational uses of the information that comes from that data. We'll address aspects of the complexity that exists between data and discovery."

Data-driven discovery builds on advanced information systems to collect, transport, store, manage, integrate and analyze raw data and the resultant data products to make them accessible, searchable, and usable to the wider community of scientists and engineers. There are also challenging questions about data access, security, and long-term curation.

"Many disciplines and projects face the same or very similar issues, so it is more productive for the researchers if we leverage solutions across multiple domains rather than each community separately grappling with its data challenges in isolation," Pennington says.

NCSA's Blue Waters supercomputer provides a massive data infrastructure, including 1.5 petabytes of aggregate system memory, usable storage bandwidth over 1 terabyte-per-second for the 25 petabyte online data system, and a 300 petabyte automated near-line storage environment. This powerful system enables researchers to tackle the most challenging big data problems. Researchers are counting on the NCSA data cyberinfrastructure to enable groundbreaking science and engineering projects.

Initially the Data Cyberinfrastructure Directorate will focus on five areas of science and engineering where NCSA and the University of Illinois have strengths: Astronomy, Biomedicine, Sustainability, Industry, and Geographic Information Systems. Major projects include:

  • Using a 570-megapixel camera, the international Dark Energy Survey collaboration will undertake the largest galaxy survey ever attempted. NCSA leads efforts in processing, calibrating, and archiving the data DES will collect each night, yielding approximately 2 petabytes of over the lifetime of the survey. Scientists will use this wealth of data to carry out four probes of dark energy, the first time all four methods will be possible in a single experiment.
  • The Large Synoptic Survey Telescope will use an 8.4-meter telescope and 3-gigapixel camera to produce a wide-field astronomical survey of the universe that tracks its changes over time; in addition to probing the mysteries of dark energy and dark matter, LSST is designed to detect exploding supernovae, potentially hazardous near-Earth asteroids, and distant Kuiper Belt Objects. LSST will collect tens of terabytes of data every night, which will be processed, calibrated, and archived by NCSA.
  • With the Mayo Clinic, NCSA researchers explore projects in computational medicine, blending Illinois' strengths in computational analysis, bioinformatics, and big computing and data with Mayo's medical informatics and clinical practice expertise.
  • NCSA is a key partner in the effort to develop and deploy the Illinois Shared Learning Environment (ISLE). ISLE will bring together a wealth of data about Illinois' students to help teachers improve instructional practices, decision-making, and student outcomes.

Pennington says there are valuable synergies between the new data team and NCSA's efforts in extreme-scale computing (such as the petascale Blue Waters supercomputer) and distributed cyberinfrastructure (like XSEDE). "This new scientific data focus will be one more integrative component among NCSA's other programs and capabilities," Pennington says.

The Data Cyberinfrastructure Directorate has strong collaborations with researchers at the University of Illinois at Urbana-Champaign and with national and international teams.

National Science Foundation

Blue Waters is supported by the National Science Foundation through awards ACI-0725070 and ACI-1238993.

National Science Foundation

XSEDE is supported by National Science Foundation through award ACI-1053575.