NCSA puts world’s largest High Performance Storage System into production

05.30.13 -

A massive 380 petabyte High Performance Storage System (HPSS)—the world’s largest automated near-line data repository for open science—is now in full service production as part of the Blue Waters project at the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign.

The HPSS environment consists of multiple automated tape libraries, dozens of high-performance data movers, a large 40 Gigabit Ethernet network, hundreds of high-performance tape drives, and about a 100,000 tape cartridges. This “big data” capacity is available to scientists and engineers using the sustained petascale Blue Waters supercomputer. The storage system can be easily expanded and extended to accommodate the extreme data needs of other science, engineering, or industry projects.

“With the world’s largest HPSS now in production, Blue Waters truly is the most data-focused, data-intensive system available to the U.S. science and engineering community,” said Blue Waters deputy project director Bill Kramer.

The HPSS hierarchical file system software is designed to efficiently manage the access and storage of hundreds petabytes of data at high data rates. HPSS manages the life cycle of data by moving inactive data to tape and retrieving it the next time it is referenced. The highly scalable HPSS is the result of two decades of collaboration among five Department of Energy laboratories and IBM, with significant contributions by universities and other laboratories worldwide.

During pre-production acceptance testing, the HPSS deployment at NCSA:

  • Demonstrated, for over 5 billion files within a single name space, constant file ingest and retrieval performance, independent of the number of files in the system.
  • Ingested 426 terabytes and retrieved 499 terabytes of data in 24 hours (averaging a total throughput of 38.5 terabytes per hour).

During its first weeks in production, NCSA’s HPSS has sustained an average ingest rate of 5.5 terabytes per hour (132 terabytes per day).

Michelle Butler, NCSA’s senior technical program manager for storage and networking, recalls that during the more than 25 years the center has run archives and near-line systems “It took us 19 years to reach our first petabyte and an additional year to accumulate the second petabyte. For the Blue Waters system, we had our first petabyte in just two weeks.”

NCSA joined forces with the HPSS Collaboration’s Department of Energy labs and IBM to develop an HPSS capability for Redundant Arrays of Independent Tapes (RAIT)—tape technology similar to RAID for disk. RAIT dramatically reduces the total cost of ownership and energy use to store data without danger from single or dual points of failure through generated parity blocks. It also enhances the performance of data storage and retrieval since the data is stored and read/written in parallel.

“With RAIT, we get data integrity through parity striping with only one-fifth the cost in extra tape cartridges, and equally important to us, with only one-fifth of the extra floor space, tape libraries, and power than traditional mirroring would require,” Butler says.

“The HPSS collaboration is very pleased to be working closely with the NCSA Blue Waters team to continue delivering world-leading big data storage capabilities,” says Buddy Bland, Oak Ridge Leadership Computing Facility project director and member of the HPSS Collaboration Executive Committee. “The size of NCSA’s HPSS, the largest deployed, demonstrates its scalability and performance leadership. Its new RAIT capability will reduce tape costs and allow for very reliable, high-performance, striped tape I/O.”

For more information about Blue Waters, visit bluewaters.ncsa.illinois.edu.

National Science Foundation

Blue Waters is supported by the National Science Foundation through awards ACI-0725070 and ACI-1238993.