Behind the scenes of NCSA: Data aces

09.09.11 -

The computer information storage field, both hardware and software, has changed immensely through the years. In the early days, storage didn't have many options; it was big platters, slow access arms, and even slower manually mounted tapes. File system performance and archival storage were not glamorous like the supercomputers, merely a background service. Storage now receives more attention because of data's increased importance, due to the ever increasing computing power and the massive amount of data today's supercomputers can both consume and generate. Balancing data speeds with computing power is the name of the game!

The Storage Enabling Technologies group of Jason Alt, Michelle Butler, Jim Glasgow, Chad Kerner, Andy Loftus, and Alex Parga are responsible for reliably storing the data of NCSA's supercomputer users. They maintain the storage subsystems on the supercomputers and other research environments at the center which includes the hardware for the disk environment, the network that the storage environment may attach to, the host connection configurations, the file system and implementation, the backup and disaster recovery of that data, the archive connectivity and usability/performance for those systems.

Their jobs are something that's performed quietly in the background so scientists can get their work done. Fast, reliable, fault-tolerant storage environments are critical in today's high-performance computing environments; supercomputers aren't supposed to be sitting idle waiting for data to arrive.

Successful data storage requires research, technology, and engineering. NCSA has played key roles in pioneering new storage architectures, integrating into the systems cutting-edge technologies such as RAID (Redundant Array of Independent Disks) deployment configurations for data reliability, input/output (I/O) server failover for data availability, a large variety of file systems and configurations depending on data requirements and data striping algorithms for performance.

"A huge change that has occurred in recent years is that we've realized storage needs to be designed not just for the computational system but for the application," says manager Michelle Butler. "Modern information storage designs don't come shrink-wrapped and ready to roll. It takes time and expertise to do the research, determine the optimal configurations, and integrate everything together to meet both the performance and the reliability requirements."

Just as they've seen changes in the medium used to accomplish the storage task, they've also seen changes in the volume of data to be stored.

NCSA users have access to an archive system to store their working data. It took 19 years to reach one petabyte of stored data in that archive; and then just one year to archive a second petabyte. That archive currently stands at 8PB and grows between 75 to 100 percent each year.

The Stone Age
Humans first began recording important information on the walls of caves. Looking for a more portable medium, they progressed to chiseling on rocks.

The Paper Age
But rocks were heavy and cumbersome. After some experimentation, scribes began to use scrolls of papyrus to record important information.

The Tape Era
As late 20th Century businesses embraced the computer, a technological version of the filing cabinet was needed. Saving data to magnetic tape became the norm. First there was round tape, and then came square, which we still use today.

The Portable Medium
Magnetic tape requires special tape readers. Enter the floppy disk. Share information simply by carrying your floppy to any computer and inserting.

The CD Revolution
Floppies began to fade as disk drives that could read/write CD-ROM disks holding even more data were developed.

The Solid State and More to Come...
In 2000, new technologies coupled with modern flash memory systems. Removable USB storage devices become popular for personal use. Large information storage systems rely on solid-state drives, which have the speed of computer memory, but not the volatility—it keeps the info written if power is interrupted.