 |
There are currently two versions of HDF in use: HDF4 and HDF5. HDF4 is backward compatible with all earlier versions and is currently used by NASA. The newer version, HDF5, has been developed for new information technologies, such as parallel processors and extremely large files.
NCSA teamed with three Department of Energy (DoE) national laboratories—Lawrence Livermore, Los Alamos, and Sandia—to develop HDF5, and the DoE's ASCI program has also adopted the upgrade. Folk cites several improvements in HDF5 over previous versions. This version can handle files of unlimited size, whereas previous versions limited file size to two gigabytes. The HDF5 software also offers input and output for parallel computing environments. Creating, transfering, and retrieving files in parallel greatly speeds up the data transfer and storage process.
HDF5 is also faster because of its ability to organize huge datasets into highly regular subsets, a process known as chunking. Datasets, explains Folk, can be thought of as cubes of data. Many of the EOS data cubes contain information about the Earth within certain altitude, latitude, and longitude ranges. Some of these files are so large that they cover the entire Earth. Chunking divides a large data cube into many smaller cubes.
"With datasets becoming so large, accessing the information you need becomes an important issue," says Folk. "You don't want to spend all your time searching. With chunking, the software will search only the chunks that contain the data you've requested." This process can speed up data retrieval by as much as 100 times.
|