released 12.01.08
By Michelle Butler
Technical Program Manager
NCSA Storage Enabling Technologies Group
mbutler@ncsa.uiuc.edu
The information storage field, both hardware and software, has changed immensely during the 19 years that I've been with NCSA. Back in 1990, storage didn't have a whole lot of options; it was big platters, slow access arms, and even slower manually mounted tapes. File system performance and archival storage were not glamorous like the supercomputers, merely a background service. Recently, storage has received more attention because of data's increased importance, due to the ever increasing computing power and the massive amount of data today's supercomputers can both consume and generate. Balancing data speeds with computing power is now the name of the game!
Performance
I've always thought of my job as something that's performed quietly in the background so scientists can get their work done. If someone notices the storage team working, then we aren't doing our jobs properly. Worse, we have potentially impacted scientists and engineers by keeping them from doing their research. Fast, reliable, fault-tolerant storage environments are critical in today's high-performance computing environments. Supercomputers aren't supposed to be sitting idle waiting for data to arrive at the processor.
The recent emergence of storage requires research, technology, and engineering. NCSA has played key roles in pioneering new storage architectures, integrating into our systems' cutting-edge technologies such as RAID (Redundant Array of Independent Disks) for data reliability, input/output (I/O) server failover for data availability, and data striping algorithms for performance (striping is the process of breaking up the data into chunks and distributing it across mulitple disks). In fact, the storage architecture is an integral part of the design of the Blue Waters sustained-petascale system; the machine will not reach its performance goals without a balanced storage system. That system will have its own leaps in data management and ease of use that don't exist today, pushing storage to new levels.
Reliability
Ranking right up there with performance is reliability. In the storage world we use the term "five nines," meaning "uptime" of 99.999 percent, which translates into less than nine hours of downtime per year. It is important to note that the nine hours a storage or compute system might be down covers hardware meltdowns, software bugs, or operation errors.
The more able the system is to keep running when something fails or goes wrong—the more reliable and the more fault tolerant—the more expensive it is. Archival tape storage isn't sexy, but it's still the cheapest way to store petabytes of data that don't require instantaneous retrieval. This hasn't changed much over the years, nor has the fact that people don't like using tape storage devices, primarily because users are waiting for their data. NCSA has built large data caches on the front of the archive server in recent years so the most active data lives on the secondary disk cache, thus minimizing tape access as well as data retrieval time. Our current ingest rate to the archive server is 80-100 terabytes of data every month.
New approaches
A huge change that has occurred in recent years, however, is that we've realized the same "shoe" doesn't fit on every "foot," and storage needs to be designed not just for the computational system but for the application. One of our largest production machines has over half a petabyte of disk storage using a parallel file system engineered for extreme speed on HPC storage devices. Another system has a little more than 200 terabytes, utilizing a parallel file system with storage hardware that has 99.999 percent in reliability with additional I/O server failover already built in, engineered with a focus on high reliability and availability. With our machines we use different devices, communication paths, and file systems, but provide the same basic overall need of storage.
In the old days we generally could purchase disks, tape systems, and software and we were good to go. Modern information storage designs don't come shrink wrapped and ready to roll. It takes time and expertise to research devices and software, determine the optimal configurations, and integrate it all together to meet both the performance and the reliability requirements. Storage environments take a long time to understand.
RAID is a technology that has emerged in recent years and is now commonplace. It changed the face of storage completely, adding increased data reliability and availability. Choosing to go with a RAID technology has pros and cons. Utilizing RAID allows data to be striped across multiple disks and benefits both performance and reliability (hopefully). The RAID can be internal or external to a host system. An internal configuration would be considered a software-based RAID because the operating system is doing all the calculations and writing the data. A hardware-based RAID is an external device that does all the work for the host without the host knowing about it. The cost for the RAID storage environment is impacted directly by the level or type of RAID, and by the reliability, availability, and performance features desired. For example, Raid0 stripes data across devices without any additional reliability given to the data, hence increasing performance alone, but a lost disk still means data loss. RAID1 is mirroring the data, increasing reliability, but doing little for performance. It has a potentially negative impact of doubling storage costs since you now have two copies of all your data, and possibly two write operations an application has to wait for. RAID3&5 stripe the data differently to include a parity drive that can be used to rebuild the data when you have a disk failure. Cost increases depending on how many parity drives are built into the data stripe for reliability. RAID6 has two drives for parity, specifically to support high-performance SATA drives in large disk deployments which have a high failure rate. The Blue Waters system will use a whole different level of RAID technology due to the sheer number of disk drives and possible failures within mulitple layers.
One thing that constantly amazes me is the size of the datasets in today's research. When I started at NCSA, the Cray systems we were using had about 300 gigabytes of storage on them and the archive server had two terabytes of data. Today, we have 1.5 petabytes of local attached disk environments, including seven SANs and 4.8 petabytes of archival data. It took 19 years to store the first petabyte of data at NCSA, but the second petabyte came just one year later! Our archive system needs are increasing at a rapid rate of 75 to 100 percent every year. Storage systems on the supercomputers are constantly changing, while scientists and engineers continue to evolve their data requirements to take full advantage of the computing power available. This keeps our team busily buzzing in the background; hidden, but forgotten no more.