NCSA Home
Contact Us | Intranet | Search

NCSA Mass Storage System User Guide

Table of Contents

  1. Overview
  2. Access
  3. MSS Commands
  4. Using MSS Effectively
  5. Sample Sessions and Examples
  6. Additional Resources

1. Overview

New graphic(November 4 2009): NCSA Mass Storage System Quota Policy

NCSA's Mass Storage System (MSS) is available to all users of NCSA HPC production systems for permanent storage of data. It consists of a high-performance system running UberFS backended by EMC's DiskXtender software, which implements the IEEE Mass Storage Reference Model. Developed at NCSA, UberFS is a new service that is layered on top of the existing mass storage system and provides an updated interface for data management.

This resource provides users with a network-centered, parallel storage system capable of unlimited storage capacity. Users may store files here permanently and confidently because backup of data is automatically done within this storage system. Two copies of each file are made without requiring user interaction.

MSS supports large file sizes in a UNIX-like file system environment, with certain recommended use guidelines. It manages its disk space by automatically migrating files off disk onto tape when space becomes low or if files have not been accessed in a certain length of time. If you need to retrieve a file that has been migrated to tape, the system automatically restores it to MSS disk before it is copied to your local system.

Results from production runs, important input data and other project related data can be placed in MSS permanently. Data purge policies apply to scratch spaces of production machines at NCSA. Freeing up scratch file systems by moving data to MSS is in the user communities' best interest and is the only way to permanently store large amounts of data that exceed the home directory quota. Data can be staged in and out of MSS manually or using batch script commands.

2. Access

NCSA's mass storage system can be accessed both from outside the NCSA domain and from NCSA's production machines at mss.ncsa.uiuc.edu and at mss.ncsa.teragrid.org.

Access to MSS is available via FTP and SSH based transfer clients, including GridFTP clients. You can also use SSH to login to your account and use UNIX command line utilities and basic shell scripting. Each of NCSA's production resources has a dedicated, high-speed connection to MSS. Your MSS login and password is the same as that on the NCSA HPC systems.

To access MSS using GridFTP (GSI authenticated) clients, you must obtain a grid (X.509) credential and maintain a valid proxy certificate. Refer to Creating a Proxy Using myproxy-logon for additionl proxy information. A passwordless client is available on NCSA HPC systems.

Below is a summary of the available clients:

Access Method Tools Description
newSSH SSH clients Login to MSS via SSH clients to make use of standard UNIX command line utilities.
scp, sftp, filezilla, etc. Use SSH based transfer clients for file transfer. scp and sftp available on NCSA HPC systems; use your favorite client from your local site or desktop
Passwordless FTP based clients on NCSA HPC systems mssftp The mssftp command is available on all NCSA production machines. Invoking the mssftp command will automatically connect you to MSS without the need to enter a login name or password. Once connected, the interface is similar to most text-based ftp clients with an enhanced command set that provides additional functionality on MSS. Refer to man mssftp for more information.
msscmd The msscmd command provides a command line interface to mssftp. It available on all NCSA production machines and can be used both interactively and in batch scripts (see man msscmd).
GridFTP clients uberftp The uberftp client is available on all NCSA production machines and is free to download. uberftp supports GSI authentication and GridFTP enhancements such as parallel streams. uberftp supports command line arguments and interactive use.
globus-url-copy globus-url-copy is a command line tool used to initiate file transfers by specifying source and destination URLs. globus-url-copy uses GSI authentication and will initiate a passwordless file transfer between sites over which a valid grid proxy has been issued.
Other FTP Only the Kerberized FTP client is supported /usr/local/krb5/bin/ftp on NCSA production machines. See NCSA's Security page for information on installing Kerberos on your local machine. mssftp and msscmd (described above) are recommended for accessing MSS from NCSA production machines since they provide better performance and automatically authenticate.

2.1 Known Supported Secure Access Clients

You can find access clients that have been tested to work with NCSA resources here.


3. MSS Commands

3.1 SSHnew

The NCSA mass storage system supports an SSH interface into a UNIX environment. Many expected commands of a UNIX-like environment are available. The default shell is tcsh; contact help@ncsa.uiuc.edu to change to bash.

Below are the supported commands (located in /bin):

awk       cp    false     last    nslookup  sleep  test   whereis
basename  csh   find      ldd     openssl   sort   time   which
bash      date  gawk      less    ps        stage  touch  who
cat       diff  grep      ln      pwd       stat   true   xargs
chfam     dig   groups    ls      rm        stty   tty    zsh
chgrp     dir   head      md5sum  rmdir     sum    uname
chmod     du    hostname  migchk  rsync     tail   vi
chown     echo  id        mkdir   sed       tar    vim
cksum     env   kill      more    setfam    tcsh   wc
compress  ex    ksh       mv      sh        tee    wget
Also see "Useful MSS commands" (below) for details on useful commands.

3.2 FTP

Use standard FTP commands such as get, put, mget, mput, ls, dir, and rename to manipulate your files locally or remotely. Enter man ftp for more information on FTP. There is also a help utility available that lists the commands:

ftp> help
Usage "help [topic]" where topic is one of:
!               ?               active          ascii           binary
blksize         bugs            bye             cat             cd
cdup            chgrp           chmod           cksum           close
dcau            debug           delete          dir             disconnect
family          get             glob            hash            help
keepalive       lcat            lcd             lcdup           lchgrp
lchmod          ldelete         ldir            lls             lmdelete
lmkdir          lpwd            lquote          lrename         lrm
lrmdir          ls              lsize           lstage          mdelete
mget            mkdir           mode            mput            open
order           parallel        passive         pbsz            pget
pput            prot            put             pwd             quit
quote           recv            rename          resume          retry
rm              rmdir           runique         send            size
stage           sunique         tcpbuf          type            verbose
versions        wait
Also see "Useful MSS commands" (below) for details on useful commands, and some additional commands supported.

3.3 Useful MSS commands

Below is a list of useful commands to facilitate use of the mass storage system. Most are available via both SSH and FTP access (exceptions are noted).

COMMAND FUNCTION ENTRY NOTES
chgrp Changes the group associated with a file chgrp groupname filename(s) Kerberized FTP does not support file name wildcards.
chmod Changes the permissions for a file chmod permissions filename(s) Kerberized FTP does not support file name wildcards.
stage Stages (caches) a file from tape to disk stage wait_time filename(s) (FTP)

stage filename(s) (SSH)

waittime is the desired waiting time (locked by the transaction) in seconds. You are given control of the prompt when waittime has expired. If you do not want to be locked by the transaction, enter a zero value. Files will be staged to disk regardless of the value of waittime. Kerberized FTP does not support file name wildcards.
wait Toggles on/off waiting for a file to be cached from tape to disk when using the FTP get command wait msscmd has this enabled by default and SSH-based transfers always automatically wait.
ln Creates a symbolic link quote ln target linkname (FTP)

ln -s target linkname (SSH)

setfam change the MSS family of files quote setfam family (FTP)

setfam family filename(s) (SSH)

The FTP command changes the family of subsequent files to be uploaded. To change a specific file's family via FTP, the file must be retrieved and uploaded again with the desired family.

Note for Kerberized FTP: all the above commands need to prepend quote (if not already specified). For example:

quote stage 0 myfile

4. Using MSS Effectively

  • Single File size

    For best use of MSS, a 1 TB maximum single file size is recommended; in practice 500 GB would be optimal.

  • Which method of access to use

    For the convenience of a UNIX environment, use SSH to log in to your MSS account. However SSH based file transfers may be slower than FTP due to encryption so especially for large data transfers, FTP is preferred.

  • Using tar Files

    If you frequently store many small- to medium-sized files in MSS, create UNIX tar files before saving to MSS. (When you "tar" files, you create a single file that is a collection of other files.) For information about the tar command, enter man tar.

    If you have a directory with 50 files, for example, that are frequently accessed together, retrieving a single tar file containing all 50 files is significantly faster than retrieving each file one by one. If stored individually, the files could potentially reside on 50 different tapes. When each of these 50 files is accessed again, you would have to wait for the tapes to be mounted and the files cached to MSS disk before they are copied to your local disk. If the 50 files are combined into a single tar file that resides on tape, only one tape mount request is generated and only one tape is read to retrieve your files.

    Important Note: Do not use tar when the resulting file is larger than the recommended single file size (above). Also, if you need to only access a subset of the files at any given time, tar may not be optimal since you will need to retrieve the entire (large) tar file from MSS to access a few files contained within.

  • Built-in tar command

    msscmd has a built-in tar command that can also be used to save or restore a tar file from MSS. See the section "Using TAR" in the msscmd man page for syntax details and examples.

    To save the current directory to a tar file using the built-in tar command:

      msscmd "tar cf filename.tar ."
    

    To restore a tar file from MSS into the current directory:

      msscmd "tar xf filename.tar"
    
  • Preserving File Creation Information

    The current FTP protocol standard does not support saving files with the remote site's creation and modification times so this information will be lost when the files are retrieved from MSS. When a file is retrieved its access and creation times are timestamped with the time of retrieval.

    One way to preserve files' creation times and other file attributes (such as permissions) is to tar the files. The file attributes of the tar file will be lost, but the attributes of the files within the tarball will be preserved.

    Using the SSH interface, metadata can be preserved by using the -p option when copying files within MSS:
      cp -p mssSRC mssDEST
    
    and when copying files between other machines and MSS:
      scp -p mssSRC otherDEST
      scp -p otherSRC mssDEST
    
  • Recursive features

    The FTP clients mssftp and uberftp support recursive features for:

    • chgrp & lchgrp
    • chmod & lchmod
    • rm & lrm
    • ls & lls
    • get
    • put
    • stage

    In SSH sessions, the following commands have recursive abilities:

    • ls
    • cp
    • rm
    • chgrp
    • chmod
    • stage
    • grep
    • scp

  • Tape-to-Disk Restoring

    When a file needs to be retrieved from MSS tape, it is normally first copied to MSS disk and then transferred to the remote host. To make room for files being read from tape, old files on the MSS disk are purged.

    Two additional fields are displayed when using the ls command.

    ftp> ls *file
    -rwxr-xr-x     1   jsiwek       ac DK           common       349960 Jan  9 10:12 recentfile
    -rwxr-xr-x     1   jsiwek       ac AR           common      6973440 Jan  9 10:12 olderfile
    

    The first extra field displays the value AR or DK for each file and directory:

    • AR (archive): File or directory is archived to tape and currently resides there.
    • DK (disk): File or directory currently resides on MSS disk. The file may also reside on tape, but MSS uses the disk copy for performance reasons.

    The additional field displayed after DK or AR in the directory specifies to what "family" the current listed file belongs. More information on MSS Families is presented below.

    During heavy usage times when there are many pending tape mount requests, retrieving a file that has been migrated to tape could take several minutes. When you use the get command, you will get a message that tells you the file is being retrieved from tape.

    ftp> get olderfile
    olderfile is being retrieved from the archive. Please retry the transfer once
    the file is staged or use the wait command.
    
    With mssftp, the wait option (see below) is turned off by default, so the FTP client will not wait until the file is cached from tape. So you must issue a second get command to obtain the file from MSS disk to the local resource. After a few minutes:
    ftp> ls olderfile
    -rwxr-xr-x     1   jsiwek       ac DK           common      6973440 Jan  9 10:12 olderfile
    ftp> get olderfile
    olderfile: 6973440 bytes in 0.688793 Seconds (9.655 MB/s)
    

    Alternatively, toggle wait by typing wait prior to issuing the first get command, and it will not be necessary to do a second.

    ftp> ls olderfile
    -rwxr-xr-x     1   jsiwek       ac AR           common      6973440 Jan  9 10:12 olderfile
    ftp> wait
    WAIT is enabled.
    ftp> get olderfile
    olderfile: 6973440 bytes in 0.688793 Seconds (9.655 MB/s)
    
    Or use the msscmd command instead.
    [jsiwek@honest2 ~]$ msscmd "ls olderfile, get olderfile"
    -rwxr-xr-x     1   jsiwek       ac AR           common      6973440 Jan  9 10:12 olderfile
    olderfile: 6973440 bytes in 0.688793 Seconds (9.655 MB/s)
    
    The wait and msscmd methods eliminate the need to issue a second get (or alternatively use the stage command (described below)), but you may still wait a significant time before archived files begin transferring.
  • Improving Access Times Using Prestaging

    MSS can prestage files that have been archived to tape. To copy the files to MSS disk for near-future access, use the stage command (see Useful MSS Commands for more details). stage copies your files to MSS disk without actually transfering them to the local filesystem so that when you later attempt to retrieve your files, they can be transferred directly from MSS disk.

    For example, if you have files file1, file2, and file3 in your directory in MSS and they all reside on tape, you can copy them to MSS disk by entering any of the following mssftp commands:

     ftp> stage 0 file?
     ftp> stage 0 file*
     ftp> stage 0 file[1-3]
     ftp> stage 0 file1 file2 file3
    

    The second argument of stage shown above is the number of seconds that the command will wait at the prompt for the stage to complete. It's recommended to use 0 since the staging request is always sent to MSS and occurs in the background even if the stage command returns to the prompt before the file is retrieved from tape.

    Recursive staging of a directory can be done with the -r option:

    ftp> stage -r 0 dir
    dir/file1: Success
    dir/file2: Success
    dir/file3: Success
    dir/dir2/file4: Success
    

    There is no penalty for using stage on files that already exist on disk. The command just returns the "Success" message. If the file exists solely on tape and you use the staging commands on this file, the "Staging" message is displayed.

    stage is effective for sessions that necessitate retrieving archived files of both significant quantity and size. This session would start by issuing a stage command for all files, then a waiting get command (through toggling wait or using msscmd) for all files. This sequence will allow files to continue staging to MSS disk in the background while transfers to the local system ensue for files that have finished staging, hopefully with shorter latencies (due to MSS disk staging) for each subsequent transfer.

    HINT: Only stage the files that have a near-immediate need. Prestaging all your files could purge other files on disk that you wanted to use. Or, if you prestage your files too far in advance, they could be purged from disk before you have a chance to access them.

    NOTE: Kerberized ftp does not have all the functionality listed above. One must use the `quote stage 0 file` command to stage individual files. Wildcards are not supported.

  • MSS Families

    If you always access a large number of files at the same time, it might be useful to have all the files belong to the same family. All files that belong to a family reside on a common set of tapes, so there are fewer tape mounts required to access the files. An approximate rule of thumb is that if you save more than 500 files every month for an extended period with a plan for frequent use of them, it may be worth asking for a family. Users can request a family to be assigned to them by sending an e-mail to help@ncsa.uiuc.edu.

  • Automated Saving of Files from Batch Jobs

    On the NCSA Intel 64 Cluster (abe) and the SGI Altix (cobalt), the saveafterjob utility is available for automated, guaranteed saving of files from batch jobs to mass storage. Users writing their own job scripts should specify the $SCR environment variable as the destination file system for the job.

    Important Note: Be sure to specify all files that need to be saved to saveafterjob. The default behavior of saveafterjob is to purge the temporary job directory upon successful transfer of the specified files. See the document Automated Saving of Files from Batch Jobs for more information.

  • MSS Directory Listing

    Occasionally you may have a need to get an inventory of your files in MSS. There are several ways to list the contents of your MSS directory tree (or a subdirectory):

    mssls utility on NCSA HPC systems % mssls [mss_dir]
    recursive listing via SSH % ssh mss.ncsa.uiuc.edu "ls -lR [mss_dir]"
    msscmd (or interactively with mssftp) % msscmd "ls -r [mss_dir]"
    uberftp % uberftp mss.ncsa.uiuc.edu "ls -r [mss_dir]"

5. Sample Sessions and Examples

5.1 Sample SSH Session

Connecting

Use the mss.ncsa.uiuc.edu hostname to connect. Accept the fingerprint if connecting for the first time, then use your NCSA kerberos password.
[jsiwek@co-login ~]$ ssh mss.ncsa.uiuc.edu
The authenticity of host 'mss.ncsa.uiuc.edu (141.142.31.94)' can't be established.
RSA key fingerprint is f7:fc:9f:45:91:9e:c6:70:b6:75:af:bf:b1:ae:b2:bf.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'mss.ncsa.uiuc.edu' (RSA) to the list of known hosts.
jsiwek@mss.ncsa.uiuc.edu's password:
Last login: Fri Jan  9 11:36:28 2009 from co-login.ncsa.uiuc.edu
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
mss ac/jsiwek>

Standard UNIX commands

Many of the basic commands expected of a unix-like environment are available.

Getting help on a command

Commands often come with a --help option to help explain their usage.
mss jsiwek/testdir> mkdir --help
Usage: mkdir [OPTION] DIRECTORY...

Displaying Directories

The current directory is obtained with pwd, and also displayed along with its parent directory in the shell prompt. A detailed, long listing of a directory can be seen with ls -l. Navigating into different directories is done with the cd command.
mss ac/jsiwek> pwd
/UROOT/u/ac/jsiwek

mss ac/jsiwek> ls
test.tar    testdir

mss ac/jsiwek> cd testdir

mss jsiwek/testdir> ls -l
total 64
-rwxr-xr-x   1 jsiwek ac DK common  8 Jan  9 10:12 myfile
drwxr-xr-x   2 jsiwek ac DK common 42 Jan 21 16:03 subdir

Manipulating Files and Directories

New directories can be created with mkdir. New files can be created and edited with vim. It's possible to display contents and concatenate files with cat or to search for patterns contained within files with grep.
mss ac/jsiwek> mkdir dir

mss ac/jsiwek> echo grepForThis > grepMe
mss ac/jsiwek> echo grepForThis > grepMe2
mss ac/jsiwek> cat grepMe
grepForThis
mss ac/jsiwek> grep grepForThis *
grepMe:grepForThis
grepMe2:grepForThis

Deleting Files and Directories

The -r option must be used for directory deletion.
mss ac/jsiwek> rm testdir/myfile
mss ac/jsiwek> rm dir
rm: cannot remove `dir': Is a directory
mss ac/jsiwek> rm -r dir

Listing File/Directory structure and count totals

To see the layout of your directories and files plus the count total for each, you can use the "tree" command. The "tree" command takes a directory as an argument. If no argument is given, tree will use the current working directory where the command is run from.
[mss ~]$ tree programs

programs
|-- c
|   |-- mpi
|   |   |-- batch
|   |   |   `-- tg.login-hw2-mpi.c.batch
|   |   |-- hw2-mpi.c
|   |   |-- hw2-mpi.c.orginal
|   |   |-- interactive
|   |   |-- ring-mpi.obj
|   |   |-- ring-mpi_topo.c
|   |   |-- ring-mpi_topo.o
|   |   `-- ring-mpi_topo.tar
|   |-- serial
|   |   |-- batch
|   |   `-- interactive
|   `-- test-work
|       |-- a-static.out
|       |-- a.out
|       |-- hw2-mpi.c
|       |-- titan-mpi2hw.batch
|       `-- tmp
|           |-- gst-tg-login-srl.batch
|           |-- tnMPI-061103.e
|           `-- tnMPI-061103.o
|-- fortran
|   |-- mpi
|   |   |-- Sample_mpio.f90
|   |   |-- batch
|   |   |-- hw2-mpi.f
|   |   |-- hw2-mpi.f-out

...

|       `-- a,out
`-- java
    |-- mpi
    |   |-- batch
    |   `-- interactive
    `-- serial
        |-- batch
        `-- interactive

24 directories, 37 files

Note: For large complex directory/file tree strucures you may want to send the standard output to a file.

Additional Ported MSS commands

  • stage - copies data from MSS tape archive into MSS disk cache
    mss ac/jsiwek> ls -l oldfile
    -rw-------   1 jsiwek ac AR common    194560 May 13 12:48 oldfile
    mss ac/jsiwek> stage oldfile
    oldfile: Staging
    0 files remaining.
    
    A few minutes later:
    -rw-------   1 jsiwek ac DK common    194560 May 13 12:48 oldfile
    
  • chfam - change the MSS family of files
    mss ac/jsiwek> chfam
    Usage: chfam family [filenames]
    mss ac/jsiwek> touch newfile
    mss ac/jsiwek> ls -l newfile
    -rw-r--r--   1 jsiwek ac DK  common 0 Jan  9 12:30 newfile
    mss ac/jsiwek> chfam unitest newfile
    mss ac/jsiwek> ls -l newfile
    -rw-r--r--   1 jsiwek ac DK unitest 0 Jan  9 12:30 newfile
    

Transferring Files

MSS can be the source or destination of SSH based transfers, like scp.
mss ac/jsiwek> echo "xxxxx" > file
mss ac/jsiwek> scp file jsiwek@cobalt.ncsa.uiuc.edu:~
jsiwek@cobalt.ncsa.uiuc.edu's password:
file                                       100%    6     0.0KB/s   0.0KB/s   00:00
mss ac/jsiwek> rm file
mss ac/jsiwek> scp jsiwek@cobalt.ncsa.uiuc.edu:~/file .
jsiwek@cobalt.ncsa.uiuc.edu's password:
file                                       100%    6     0.0KB/s   0.0KB/s   00:00
mss ac/jsiwek> cat file
xxxxx

5.2 Sample FTP Session

The following sample FTP session with MSS uses a subset of common FTP and MSSFTP commands.

Connecting

To initiate an FTP session to MSS from an NCSA UNIX production machine, a user enters the mssftp command at the prompt.

[jsiwek@co-login ~]$ mssftp
220 mss.ncsa.uiuc.edu GridFTP Server 3.11 (gcc64dbgpthr, 1213742010-78) [Globus Toolkit 4.2.0] ready.
230-User jsiwek logged in.
ftp>

Getting help on a command

More information on a command is available by specifying the command as an option to help:

ftp> help rm
Removes the remote file system object(s).

Usage: rm [-r] object1 [object1...objectn]
-r   Recursively remove the given directory.

Note: Commands only operate recursively in subdirectories if -r option is given and available.

Displaying the Directory

The example below displays the current MSS directory path by entering the pwd command and then displays the contents of the working directory by entering the ls command.

ftp> pwd
/UROOT/u/ac/jsiwek
ftp> ls
drwxr-x---     6   jsiwek       ac DK           common           80 Jan  7 16:49 .
drwxr-xr-x 11621  unitree       ac DK           common       262144 Jan  8 16:19 ..
drwx------     2   jsiwek       ac DK           common           24 Dec 18 11:58 .ssh
-rw-r--r--     1   jsiwek       ac DK           common    778137600 Dec 30 09:43 test.tar
drwx------     2   jsiwek       ac DK           common        16384 Jan  8 13:07 testdir

Transferring Files

The user enters the FTP put command to transfer the file myfile into MSS. Next the chmod command is used to change the permissions of the file to user read/writable, group readable, and world inaccesible (640). Finally, the group is changed with chgrp.

ftp> put myfile
ls myfile: 8 bytes in 1.178463 Seconds (6.789 B/s)
ftp> ls myfile
-rw-r--r--     1   jsiwek       ac DK           common            8 Jan  9 11:11 myfile
ftp> chmod 640 myfile
ftp> ls myfile
-rw-r-----     1   jsiwek       ac DK           common            8 Jan  9 11:11 myfile
ftp> chgrp tornado myfile
ftp> ls myfile
-rw-r-----     1   jsiwek  tornado DK           common            8 Jan  9 11:11 myfile
For creating and adding users to MSS groups, contact help@ncsa.uiuc.edu.

Deleting Files

Within mssftp, delete files with the rm command. End the FTP session with MSS by entering quit.

ftp> rm myfile
ftp> ls myfile
No match for myfile
ftp> quit
221 Goodbye.

Note: In kerberized FTP, rm expands to rmdir which only deletes directories. To delete files, use delete.

6. Additional Resources