Paper - Data Migration from PSC to NCSA version4

Data Migration to NCSA from PSC:

Transfer Rate Characteristics

[[ DRAFT - Feb 19, 1998 ]]

Robert L. Pennington, Tony Rimovsky, and Michelle Butler

Computing and Communications Division, NCSA

Executive Summary

Over the coming months, it will be necessary to transfer large amounts of data to the NCSA from the PSC archive as users migrate to the NCSA and SDSC as part of the PACI program. These users have in excess of 12 terabytes of data in the PSC archive. The task of moving this data for the users is the responsibility of the staffs of the respective centers. The chosen method for this massive data movement is to pack it up on a user-by-user basis at the PSC, transfer it to NCSA across the vBNS and place it into the NCSA mass storage system

We have created a software and hardware system at the NCSA that is capable of accepting, verifying and storing a user's aggregated data files transferred from PSC in an automated mode with minimal manual intervention. This work has been coordinated with PSC staff, who have created a system on their end that packs and stages the aggregated data files as a series of ~2GB cpio archives on a per user basis. These are placed on a PSC system from which we collect them using NcFTP across the vBNS.

The transfers are driven from the NCSA side using a production high performance SGI system which has significant additional duties. Two different SGI systems have been used as part of this transfer. The initial system, an R4400-based machine, was marginally able to maintain the necessary throughput rate to provide continuous transfers. The final system, an R10000 machine, was easily able to support all of its production load as well as the transfers. The data is stored into the recently deployed Convex SPP-2000 UniTree production mass storage system, which is also concurrently handling all of the mass storage needs for computational users of the NCSA.

The transfers took place across each site's internal HiPPI networks and the vBNS. The transfer rates for the raw FTP transfers across the vBNS from the PSC to NCSA using the R10000 SGI system averaged 175 seconds or 9.9 MB/s across the last 50 cpio archive files, representing approximately 90GB of data. The effective transfer rate to NCSA was significantly lower for all of these files, requiring an average of 264 seconds per cpio archive file or roughly 6.8MB/s. The primary difference is the time necessary to stage the data on the PSC end, an average of 77 seconds, and 11 seconds on the NCSA side to initiate the transfer, get the checksum files from PSC and perform disk space checks.

The effective transfer rate from PSC when using the R4400 system for the first 65 cpio archive files was 5.8 MB/s, showing that this system was a bottleneck in the effective transfer rate for the first half of the transfers.

Transfers to the NCSA UniTree mass storage system were done asynchronously with the incoming transfers from PSC. The average rate for storing data into UniTree for these 50 cpio archive files was 8.4 MB/s, significantly above the effective incoming rate of 6.8MB/s. It is worth noting that the store operations were performed almost exclusively during the afternoon and early evening weekday hours when the operation of the NCSA mass storage system is typically heavily used.

Introduction

The NSF PACI program has replaced the NSF Centers program as the vehicle for providing high performance computing resources to the NSF computational science and engineering community. The PACI sites are the Alliance, led by the NCSA and NPACI, led by SDSC. This change in the NSF high performance computing program entails the transition of current NSF sponsored academic users at the Pittsburgh Supercomputing Center and Cornell Theory Center to one or both of the PACI sites.

The movement of users to a different center also requires the movement of the data that they may have stored at the PSC or the CTC to NCSA or SDSC. As a result of an agreement between SDSC and CTC, the CTC HPSS system is to be physically moved in its entirety to SDSC which will make the data available to the former CTC users.

The user's data in the PSC archive is being handled in a different manner. Principal Investigators with data in the archive must request that their data be moved to either NCSA or SDSC. These users have on the order of 12TB of data stored in the current PSC FAR archive. This does not include the data stored in the previous archive, OFAR. It would be impossible for each PI to transfer all of his or her own files in a reasonable amount of time for such a large volume of data. To obtain a rough estimate of the amount of time necessary for users to transfer this quantity of data across the Internet, assume that the transfer rate for the Internet is a very optimistic 1 MB/s. Moving 12 TB at 1MB/s would require 12,000,000 seconds, or roughly 140 days. This does not include any of the time necessary to stage the files from tape, etc. Obviously, users cannot be expected to perform this task entirely unassisted.

It is therefore necessary for the staffs of the NCSA, SDSC and PSC to coordinate their activities to efficiently transfer the files for the users. Upon receipt of a request to move a project's files to the NCSA or SDSC (or potentially other site), the PSC aggregates the files for each PI into a series of manageable "packages" of approximately 2GB. These packed files are then transmitted across the vBNS to the NCSA or SDSC. As an initial expectation, an average rate of 5 MB/s was considered feasible.

In order for this process to work efficiently, there must be a nearly continuous stream of data out of the PSC to the NCSA. There are three critical performance areas that must be dealt with to make it possible to accomplish this undertaking for a large volume of data:

The PSC must be able to pack and stage the files efficiently.
The vBNS network bandwidth must be used effectively.
The NCSA (or SDSC) system must be able to accept, verify and store the data.

NCSA System Configuration and Input Data Rates

The NCSA system was composed of a machine that fetched the data from PSC, performed an assurance check on the received cpio archive and, concurrently with the assurance check, stored the file into the NCSA mass storage system using the internal HiPPI network.

Two different systems were used at the NCSA to fetch the data from the PSC. The first system was an 8 CPU R4400 150 MHz SGI system that also acts as the primary NFS file server for the Power Challenge Array. A dedicated disk partition with 36 GB of space was set up on this machine to buffer the data. The first half of the data was transferred through this machine and the second half of the data was transferred through an 8 CPU R10000 195MHz SGI system with a large "common" scratch area. Figure 1 shows the FTP transfer rates into the SGI systems as a function of cpio archive number. It is clear that the R10000 system performs significantly faster than the R4400 system as it was configured. It should be noted that the disk configuration on the filesystem on the R4400 was not optimized for the transfers. The filesystem on the R10000 was tuned for better performance. The scatter in the data for the R10000 system is probably due to the fact that the filesystem was shared with production users at the time of the transfers.

The average FTP transfer rate to the R4400 system was 6.8 MB/s and it is clear that the rate is decreasing with time. This was due to increasing load that the transfers were putting on this system as a small backlog of 4-6 files was built up over the six hour period of the transfers. This backlog was for files being checked and transferred into UniTree. Had the process continued for much more than six hours, the R4400 would probably have begun to fall behind and the throughput would have dropped. The rate for the R10000 system was considerably higher at 9.9MB/s. There was no backlog as this system was able to process the data and send it to UniTRee at a higher rate than the rate at which it was coming in from PSC.

UniTree Store Rates

A new Convex SPP-2000 became the production mass storage system at the NCSA in January, 1998. Pre-production testing of the SPP-2000 system showed it to be significantly faster than the C3-based UniTree system that it replaced as well as being more reliable. This has been confirmed over the following weeks since deployment. The use of the new system has been a great benefit to the transfer of data for users migrating to NCSA from the PSC. The figure below shows the transfer rates for storing the files from the SGI "landingzone" system to UniTree.

It is clear that the rates from the R10000 system are significantly higher than those for the R4400 system at 3.9 MB/s. The transfer rate into the UniTree system from the R10000 system averaged 8.4 MB/s. This rate is respectable when it is considered that both systems, the SGI and Convex/UniTree, are concurrently being used by production users.

Effective Transfer Rate to the NCSA from the PSC using the vBNS

The simple observation that the rate of data flowing out of the SGI "landingzone" into the NCSA mass storage system is lower than the rate at which it can be fetched using FTP and the vBNS is not currently an issue. There is a certain amount of overhead time that is not included in the vBNS rates that changes the sign of the difference

The transfers were instrumented on the NCSA side to log the times for various stages in the reception (and processing) of the cpio archives from the PSC. This was necessary to be able to track the performance of the system overall to allow us a clear view of the end-to-end system performance. The figure below shows the times necessary to deliver the last 50 cpio archive files to the R10000 SGI system, representing ~90GB of data. The average time per cpio archive file is 264 seconds, which is almost 90 seconds longer than the time taken by FTP to deliver the file.

The additional time is taken up by the necessary overhead of staging the file on the PSC end accounting for an average of 77 seconds per cpio archive , and initiating/terminating the transfer from the NCSA, performing disk space checks and other 'housekeeping' tasks on the received file, including retrieving the checksum information from the PSC for an additional 11 seconds. It is significantly less than the rate of processing and transferring the data into the NCSA mass storage system.

The R4400 system was not quite able to keep up with the rate that PSC was able to supply the data over the vBNS for the first 65 cpio archive files, requiring roughly 20900 seconds for 121 GB of data, an effective rate of 5.8MB/s.

Summary

We have examined the transfer rate characteristics for the first user to have a significant amount of data (1/4 TB) migrated to the NCSA from the PSC. The study shows that the system in place at the NCSA consisting of a "landingzone" SGI machine that receives the data, verifies it's integrity with a simple checksum and also transfers it to the NCSA mass storage system is able to absorb the input data rate that is available using the vBNS to transfer data to NCSA from the PSC. Packing and staging the data on the PSC end is the slowest part of the system, which is not unreasonable given that the data for packing is primarily coming from tape rather than disk storage.

The new NCSA mass storage systems has proven itself capable of absorbing a continuous data rate of ~6MB/s from an SGI R4400 system for a continuous period of six hours proving its capacity carrying ability by handling 121 GB faultlessly. It was able to maintain an average transfer rate of 8.4 MB/s when driven from a higher performance R10000 SGI system but this was not for as long of an extended period due to other, unrelated problems with the transfers. We do not expect any significant problems in storing 8.4 MB/s into the NCSA mass storage system for extended periods of time.

There are still a number of problems to be dealt with in terms of the reliability of the overall system and error recovery but this is being worked out at the two sites individually and in collaboration. Overall, the system is stable enough that operation of the NCSA end will shortly be turned over the Technology Management Group, which is responsible for the operation of the HPC systems at the NCSA.