NCSA Home
Contact Us Intranet

Mercury File System Overview

The NCSA Linux Intel IA64 cluster (Mercury) has a total of 891 compute nodes (256 Phase 1 nodes and 635 Phase 2 nodes), each containing two CPUs. Forty terabytes of scratch space are available to all compute nodes on a GPFS NSD–mounted parallel file system. An additional 90 TB of SAN-connected GPFS is available to the Phase 1 "fastio" resource.

Connectivity Schematic

File system and node connectivity are shown in the figure below. All compute, login and GridFTP nodes are served by the GPFS NSD servers and the NSF file system. GPFS SAN connectivity is limited to the nodes designated as resource fastio, login nodes and GridFTP servers.

Note: Different segments of the compute node pool can be requested via the -l option of qsub. The special resource classes are shown above and are summarized here:

  • fastio: 1.3GHz (256 total)
  • himem:1.3GHz with 12GB ram
  • fastcpu:1.5GHz (no SAN connectivity)

Jobs can also be submitted to run on any node by designating ia64-compute or compute. ia64-compute or compute: any node on cluster (997 total).

Mercury File System Information

Name
Location
Environment Variables Size
Policy
Quota
Purge
Backup
Home Directory /home/ncsa/$USER $TG_CLUSTER_HOME 5GB per user
5 GB
N/A
Daily
Scratch NFS /scratch/$USER $TG_CLUSTER_
SCRATCH
1TB
None
7 days
None
Scratch GPFS (NSD) /gpfs_scratch1/$USER $TG_CLUSTER_GPFS 40TB
None
7 days
None
Scratch GPFS (SAN) /gpfs_sanscratch/$USER None defined 90TB
None
7 days
None

Usage

Jobs that produce a large amount of output (hundreds of GB or more) may benefit from requesting the fastio nodes that are served by the GPFS-SAN file system. This file system is much larger and faster than the cluster-wide GPFS NSD. For more information, refer to NCSA TeraGrid User Guide: File Systems and Storage.

Important notes:

  • Home directories are mounted on an NFS file system. This arrangement allows high availability of the login nodes, but is not designed to handle parallel I/O. Do not direct output from parallel batch jobs to your home directory. This puts an unnecessary strain on our NFS servers and can slow down login node access for all users.
  • When attempting to perform I/O to the GPFS SAN file system (/gpfs_sanscratch) from a batch job, processes must be running on the fastio resource. The following is an example line from a batch script: #PBS -l nodes=32:fastio
    Failure to specify the fastio resource will result in output bouncing back to your /home directory, which also can cause NFS problems as described above and may result in data loss.

Performance Notes

Our GPFS SAN file system running on the fastio resource can read and write more than 1GB/s sustained when under a parallel load. The GPFS NSD typically levels off at around 500MB/s.