The NCSA Linux Intel IA64 cluster (Mercury) has a total of 891 compute
nodes (256 Phase 1 nodes and 635 Phase 2 nodes), each containing
two CPUs.
Forty terabytes of scratch space are available to all compute nodes on
a GPFS NSD–mounted parallel file system. An additional 90 TB of
SAN-connected GPFS is available to the Phase 1 "fastio" resource.
Connectivity Schematic
File system and node connectivity are shown in the figure below. All compute,
login and GridFTP nodes are served by the GPFS NSD servers and the NSF file
system. GPFS SAN connectivity is limited to the nodes designated as resource fastio,
login nodes and GridFTP servers.
Note: Different segments of the compute node pool can be requested
via the -l option of qsub. The special resource classes are
shown above and are summarized here:
fastio: 1.3GHz (256 total)
himem:1.3GHz with 12GB ram
fastcpu:1.5GHz (no SAN connectivity)
Jobs can also be submitted
to run on any node by designating ia64-compute or compute. ia64-compute or compute: any node on cluster
(997 total).
Mercury File System Information
|
Name
|
Location
|
Environment Variables
|
Size
|
Policy
|
|
Quota
|
Purge
|
Backup
|
|
Home Directory
|
/home/ncsa/$USER
|
$TG_CLUSTER_HOME
|
5GB per user
|
5 GB
|
N/A
|
Daily
|
|
Scratch NFS
|
/scratch/$USER
|
$TG_CLUSTER_
SCRATCH
|
1TB
|
None
|
7 days
|
None
|
|
Scratch GPFS (NSD)
|
/gpfs_scratch1/$USER
|
$TG_CLUSTER_GPFS
|
40TB
|
None
|
7 days
|
None
|
|
Scratch GPFS (SAN)
|
/gpfs_sanscratch/$USER
|
None defined
|
90TB
|
None
|
7 days
|
None
|
Usage
Jobs that produce a large amount of output (hundreds of GB or more) may benefit
from requesting the fastio nodes that are served by the GPFS-SAN
file system. This file system is much
larger and faster than the cluster-wide GPFS NSD. For more information, refer
to
NCSA TeraGrid
User Guide: File Systems and Storage.
Important notes:
- Home directories are mounted on an NFS file system. This arrangement
allows high availability of the login nodes, but is not designed to handle
parallel I/O. Do not direct output from parallel
batch jobs to your home directory. This puts an unnecessary strain
on our NFS servers and can slow down login node access for all users.
- When attempting to perform I/O to the GPFS SAN file system
(
/gpfs_sanscratch) from a batch job, processes must be running
on the fastio resource. The following is an example line from
a batch script: #PBS -l nodes=32:fastio
Failure to specify the fastio resource will result in output
bouncing back to your /home directory, which also can cause
NFS problems as described above and may result in data loss.
Performance Notes
Our GPFS SAN file system running on the fastio resource can
read and write more than 1GB/s sustained when under a parallel load. The
GPFS NSD typically levels off at around 500MB/s.