Table of Contents
- Overview
- Access
- MSS Commands
- Using MSS Effectively
- Sample Sessions and Examples
- Additional Resources
1. Overview
(November 4 2009):
NCSA Mass Storage System Quota Policy
NCSA's Mass Storage System (MSS) is available to all users of NCSA HPC
production systems for permanent storage of data.
It consists of a high-performance system running UberFS backended by EMC's
DiskXtender software, which implements the IEEE Mass Storage Reference Model.
Developed at NCSA, UberFS is a new service that is layered on top of the
existing mass storage system and provides an updated interface for data
management.
This resource provides users with a network-centered, parallel storage system capable of unlimited storage capacity. Users may store files here permanently and confidently because backup of data is automatically done within this storage system. Two copies of each file are made without requiring user interaction.
MSS supports large file sizes in a UNIX-like file system environment, with certain recommended use guidelines. It manages its disk space by automatically migrating files off disk onto tape when space becomes low or if files have not been accessed in a certain length of time. If you need to retrieve a file that has been migrated to tape, the system automatically restores it to MSS disk before it is copied to your local system.
Results from production runs, important input data and other project related data can be
placed in MSS permanently. Data purge policies apply to scratch spaces
of production machines at NCSA. Freeing up scratch file systems
by moving data to MSS is in the user communities' best interest and is the only way
to permanently store large
amounts of data that exceed the home directory quota. Data can be staged in and out
of MSS manually or using batch script commands.
2. Access
NCSA's mass storage system can be accessed both from outside the NCSA domain
and from NCSA's production machines at mss.ncsa.uiuc.edu
and at mss.ncsa.teragrid.org.
Access to MSS is available via FTP and SSH based transfer clients, including
GridFTP clients. You can
also use SSH to login to your account and use UNIX command line utilities
and basic shell scripting.
Each of NCSA's production resources has a dedicated, high-speed connection to MSS.
Your MSS login and password is the same as that on the NCSA HPC systems.
To access MSS using GridFTP (GSI authenticated) clients, you must obtain a grid
(X.509) credential and maintain
a valid proxy certificate. Refer to
Creating a Proxy Using myproxy-logon for additionl proxy information. A
passwordless client is available on NCSA HPC systems.
Below is a summary of the available clients:
| Access Method |
Tools |
Description |
SSH |
SSH clients
|
Login to MSS via SSH clients to make use of standard UNIX command line
utilities.
|
scp, sftp, filezilla, etc.
|
Use SSH based transfer clients for file transfer.
scp and sftp available on NCSA HPC systems; use your favorite
client from your local site or desktop
|
| Passwordless FTP based clients on NCSA HPC systems
| mssftp |
The mssftp command is available on all NCSA production machines. Invoking the
mssftp command
will automatically connect you to MSS without
the need to enter a login name or password. Once connected, the interface is similar to
most text-based ftp clients with an
enhanced command set that provides additional
functionality on MSS.
Refer to man mssftp for more information.
|
msscmd |
The msscmd command provides a command line interface to mssftp.
It available on all NCSA production machines and can be used both interactively and in batch
scripts (see man msscmd).
|
| GridFTP clients
| uberftp |
The uberftp client is available on all NCSA production machines and is free
to download. uberftp supports GSI
authentication and GridFTP enhancements such as parallel streams.
uberftp supports command line arguments and interactive use.
|
globus-url-copy |
globus-url-copy is a command line
tool used to initiate file transfers by specifying source and destination URLs.
globus-url-copy uses
GSI authentication and will initiate a passwordless file transfer between sites
over which a valid grid proxy has been issued.
|
| Other FTP
| Only the Kerberized FTP client is supported |
/usr/local/krb5/bin/ftp on NCSA production machines. See
NCSA's Security page for
information on
installing Kerberos on your local machine. mssftp and
msscmd (described above) are recommended for accessing MSS
from NCSA production machines since they provide better performance and
automatically authenticate.
|
2.1 Known Supported Secure Access Clients
You can find access clients that have been tested to work with NCSA resources
here.
3. MSS Commands
3.1 SSH
The NCSA mass storage system supports an SSH interface into a UNIX environment.
Many expected commands of a UNIX-like environment are available.
The default shell is
tcsh;
contact
help@ncsa.uiuc.edu to change to
bash.
Below are the supported commands (located in /bin):
awk cp false last nslookup sleep test whereis
basename csh find ldd openssl sort time which
bash date gawk less ps stage touch who
cat diff grep ln pwd stat true xargs
chfam dig groups ls rm stty tty zsh
chgrp dir head md5sum rmdir sum uname
chmod du hostname migchk rsync tail vi
chown echo id mkdir sed tar vim
cksum env kill more setfam tcsh wc
compress ex ksh mv sh tee wget
Also see
"Useful MSS commands" (below) for details on
useful commands.
3.2 FTP
Use standard FTP commands such as get, put,
mget, mput, ls, dir, and
rename to manipulate your files locally or remotely.
Enter man ftp for more information on FTP.
There is also a help utility available that lists the commands:
ftp> help
Usage "help [topic]" where topic is one of:
! ? active ascii binary
blksize bugs bye cat cd
cdup chgrp chmod cksum close
dcau debug delete dir disconnect
family get glob hash help
keepalive lcat lcd lcdup lchgrp
lchmod ldelete ldir lls lmdelete
lmkdir lpwd lquote lrename lrm
lrmdir ls lsize lstage mdelete
mget mkdir mode mput open
order parallel passive pbsz pget
pput prot put pwd quit
quote recv rename resume retry
rm rmdir runique send size
stage sunique tcpbuf type verbose
versions wait
Also see
"Useful MSS commands" (below) for details on
useful commands, and some additional commands supported.
3.3 Useful MSS commands
Below is a list of useful commands to facilitate use of the mass storage
system. Most are available via both SSH and FTP access (exceptions are
noted).
| COMMAND
| FUNCTION
| ENTRY
| NOTES
|
| chgrp |
Changes the group associated with a file |
chgrp groupname filename(s) |
Kerberized FTP does not support file name wildcards. |
| chmod |
Changes the permissions for a file |
chmod permissions filename(s) |
Kerberized FTP does not support file name wildcards. |
| stage |
Stages (caches) a file from tape to disk |
stage wait_time filename(s) (FTP)
stage filename(s) (SSH) |
waittime is the desired waiting time (locked by the transaction) in
seconds. You are given control of the prompt when waittime has
expired. If you do not want to be locked by the transaction, enter a
zero value. Files will be staged to disk regardless of the value of
waittime. Kerberized FTP does not support file name wildcards.
|
| wait |
Toggles on/off waiting for a file to be cached from tape to disk when
using the FTP get command |
wait |
msscmd has this enabled by default and
SSH-based transfers always automatically wait.
|
| ln |
Creates a symbolic link |
quote ln target linkname (FTP)
ln -s target linkname (SSH) |
setfam |
change the MSS family of files |
quote setfam family (FTP)
setfam family filename(s) (SSH) |
The FTP command changes the family of subsequent files to be uploaded.
To change a specific file's family via FTP, the file must be retrieved
and uploaded again with the desired family. |
Note for Kerberized FTP: all the above commands need to prepend quote
(if not already specified). For example:
quote stage 0 myfile
4. Using MSS Effectively
-
Single File size
For best use of MSS, a 1 TB maximum single file size is recommended; in
practice 500 GB would be optimal.
-
Which method of access to use
For the convenience of a UNIX environment, use SSH to log in to your
MSS account. However SSH based file transfers may be slower than FTP
due to encryption so especially for large data transfers, FTP is
preferred.
-
Using tar Files
If you frequently store many small- to medium-sized files in MSS,
create UNIX tar files before saving to MSS. (When you "tar" files, you
create a single file that is a collection of other files.) For information
about the tar command, enter man tar.
If you have a directory with 50 files, for example, that are frequently
accessed together, retrieving a single tar file containing all 50 files is
significantly faster than retrieving each file one by one. If stored
individually, the files could potentially reside on 50 different tapes.
When each of these 50 files is accessed again, you would
have to wait for the tapes to be mounted and the files cached to
MSS disk before they are copied to your local disk. If the 50 files
are combined into a single tar file that resides on tape, only one tape
mount request is generated and only one tape is read to retrieve your files.
Important Note: Do not use tar when the resulting file
is larger than the recommended single file size (above).
Also, if you need to
only access a subset of the files at any given time, tar may not be
optimal since you will need to retrieve the entire (large) tar file
from MSS to access a few files contained within.
Built-in tar command
msscmd has a built-in tar command that can also be used to save or restore a tar file from MSS.
See the section "Using TAR" in the msscmd man page for syntax
details and examples.
To save the current directory to a tar file using the built-in tar command:
msscmd "tar cf filename.tar ."
To restore a tar file from MSS into the current directory:
msscmd "tar xf filename.tar"
-
Preserving File Creation Information
The current FTP protocol standard does not support saving files with the remote
site's creation and modification times so this information will be lost when
the files are retrieved from MSS. When a file is retrieved its access and
creation times are timestamped with the time of retrieval.
One way to preserve files' creation times and other file attributes
(such as permissions) is to tar the files. The file attributes of
the tar file will be lost, but the attributes of the files within the tarball
will be preserved.
Using the SSH interface, metadata can be preserved by using the -p option
when copying files within MSS:
cp -p mssSRC mssDEST
and when copying files between other machines and MSS:
scp -p mssSRC otherDEST
scp -p otherSRC mssDEST
-
Recursive features
The FTP clients mssftp and uberftp support recursive features for:
- chgrp & lchgrp
- chmod & lchmod
- rm & lrm
- ls & lls
- get
- put
- stage
|
In SSH sessions, the following commands have recursive abilities:
- ls
- cp
- rm
- chgrp
- chmod
- stage
- grep
- scp
|
-
Tape-to-Disk Restoring
When a file needs to be retrieved from MSS tape, it is normally first copied to MSS disk and then transferred to the remote host. To make room for files being read from tape, old files on the MSS disk are purged.
Two additional fields are displayed when using the ls
command.
ftp> ls *file
-rwxr-xr-x 1 jsiwek ac DK common 349960 Jan 9 10:12 recentfile
-rwxr-xr-x 1 jsiwek ac AR common 6973440 Jan 9 10:12 olderfile
The first extra field displays the value AR or DK for each
file and directory:
- AR (archive): File or directory is archived to tape and currently resides
there.
- DK (disk): File or directory currently resides on MSS disk. The file
may also reside on tape, but MSS uses the disk copy for performance
reasons.
The additional field displayed after DK or AR in the directory specifies to
what "family" the current listed file belongs. More information on MSS Families is presented below.
During heavy usage times when there are many pending tape mount requests,
retrieving a file that has been migrated to tape could take several minutes.
When you use the get command, you will get a message that tells you the file is being retrieved from tape.
ftp> get olderfile
olderfile is being retrieved from the archive. Please retry the transfer once
the file is staged or use the wait command.
With mssftp, the wait option (see below) is turned off by default, so the FTP client will not wait until the file is
cached from tape. So you must issue a second get command to obtain the file from MSS disk to the local resource.
After a few minutes:
ftp> ls olderfile
-rwxr-xr-x 1 jsiwek ac DK common 6973440 Jan 9 10:12 olderfile
ftp> get olderfile
olderfile: 6973440 bytes in 0.688793 Seconds (9.655 MB/s)
Alternatively, toggle wait by typing wait prior to issuing the first get command, and it will not be necessary to do a second.
ftp> ls olderfile
-rwxr-xr-x 1 jsiwek ac AR common 6973440 Jan 9 10:12 olderfile
ftp> wait
WAIT is enabled.
ftp> get olderfile
olderfile: 6973440 bytes in 0.688793 Seconds (9.655 MB/s)
Or use the msscmd command instead.
[jsiwek@honest2 ~]$ msscmd "ls olderfile, get olderfile"
-rwxr-xr-x 1 jsiwek ac AR common 6973440 Jan 9 10:12 olderfile
olderfile: 6973440 bytes in 0.688793 Seconds (9.655 MB/s)
The wait and msscmd methods eliminate the need to issue a second get (or alternatively use the stage command
(described below)), but you may still wait a significant time before archived files begin transferring.
-
Improving Access Times Using Prestaging
MSS can prestage files that have been archived to tape. To copy the files
to MSS disk for near-future access, use the stage
command (see Useful MSS
Commands for more details). stage
copies your files to MSS disk without actually transfering them to the
local filesystem so that when you later attempt to retrieve your files, they
can be transferred directly from MSS disk.
For example, if you have files file1, file2, and
file3 in your directory in MSS and they all reside on
tape, you can copy them to MSS disk by entering any of the following mssftp
commands:
ftp> stage 0 file?
ftp> stage 0 file*
ftp> stage 0 file[1-3]
ftp> stage 0 file1 file2 file3
The second argument of stage shown above is the number of seconds that the command will wait at the prompt for the stage to complete. It's recommended to use 0 since the staging request is always sent to MSS and occurs in the background even if the stage command returns to the prompt before the file is retrieved from tape.
Recursive staging of a directory can be done with the -r option:
ftp> stage -r 0 dir
dir/file1: Success
dir/file2: Success
dir/file3: Success
dir/dir2/file4: Success
There is no penalty for using stage
on files that already exist on disk. The command just returns the "Success" message.
If the file exists solely on tape and you use
the staging commands on this file, the "Staging" message is displayed.
stage is effective for sessions that necessitate retrieving
archived files of both significant quantity and size. This session would start
by issuing a stage command for all files, then a waiting get
command (through toggling wait or using msscmd) for all files.
This sequence will allow files to continue staging to MSS disk in the background while
transfers to the local system ensue for files that have finished staging, hopefully with
shorter latencies (due to MSS disk staging) for each subsequent transfer.
HINT: Only stage the files that have a near-immediate need. Prestaging
all your files could purge other files on disk that you wanted to
use. Or, if you prestage your files too far in advance, they could be purged
from disk before you have a chance to access them.
NOTE: Kerberized ftp does not have all the functionality listed above. One
must use the `quote stage 0 file` command to stage individual files.
Wildcards are not supported.
-
MSS Families
If you always access a large number of files at the same time, it
might be useful to have all the files belong to the same family.
All files that belong to a family reside on a common set of tapes,
so there are fewer tape mounts required to access the files.
An approximate rule of thumb is that if you save more than 500 files every month
for an extended period with a plan for frequent use of them,
it may be worth asking for a family. Users can request a family to be assigned
to them by sending an e-mail to
help@ncsa.uiuc.edu.
Automated Saving of Files from Batch Jobs
On the NCSA Intel 64 Cluster (abe) and the SGI Altix (cobalt), the saveafterjob utility is
available for automated, guaranteed saving of files from batch jobs
to mass storage. Users writing their own job scripts should specify the $SCR
environment variable as the destination file system for the job.
Important Note: Be sure to specify all files that need to be saved
to saveafterjob. The default behavior of saveafterjob is
to purge the temporary job directory upon successful transfer of the specified
files. See the document
Automated Saving of Files from Batch Jobs for more information.
-
MSS Directory Listing
Occasionally you may have a need to get an inventory of your files in MSS.
There are several ways to list the contents of your MSS directory tree (or a
subdirectory):
| mssls utility on NCSA HPC systems |
% mssls [mss_dir] |
| recursive listing via SSH
|
% ssh mss.ncsa.uiuc.edu "ls -lR [mss_dir]" |
| msscmd (or interactively with mssftp) |
% msscmd "ls -r [mss_dir]" |
uberftp |
% uberftp mss.ncsa.uiuc.edu "ls -r [mss_dir]" |
5. Sample Sessions and Examples
5.1 Sample SSH Session
Connecting
Use the
mss.ncsa.uiuc.edu hostname to connect. Accept the fingerprint if
connecting for the first time, then use your NCSA kerberos password.
[jsiwek@co-login ~]$ ssh mss.ncsa.uiuc.edu
The authenticity of host 'mss.ncsa.uiuc.edu (141.142.31.94)' can't be established.
RSA key fingerprint is f7:fc:9f:45:91:9e:c6:70:b6:75:af:bf:b1:ae:b2:bf.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'mss.ncsa.uiuc.edu' (RSA) to the list of known hosts.
jsiwek@mss.ncsa.uiuc.edu's password:
Last login: Fri Jan 9 11:36:28 2009 from co-login.ncsa.uiuc.edu
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
mss ac/jsiwek>
Standard UNIX commands
Many of the basic commands expected of a unix-like environment are available.
Getting help on a command
Commands often come with a
--help option to help explain their usage.
mss jsiwek/testdir> mkdir --help
Usage: mkdir [OPTION] DIRECTORY...
Displaying Directories
The current directory is obtained with
pwd, and also displayed along
with its parent directory in the shell prompt. A detailed, long listing of a directory
can be seen with
ls -l. Navigating into different directories is done
with the
cd command.
mss ac/jsiwek> pwd
/UROOT/u/ac/jsiwek
mss ac/jsiwek> ls
test.tar testdir
mss ac/jsiwek> cd testdir
mss jsiwek/testdir> ls -l
total 64
-rwxr-xr-x 1 jsiwek ac DK common 8 Jan 9 10:12 myfile
drwxr-xr-x 2 jsiwek ac DK common 42 Jan 21 16:03 subdir
Manipulating Files and Directories
New directories can be created with
mkdir.
New files can be created and edited with
vim.
It's possible to display contents and concatenate files with
cat or
to search for patterns contained within files with
grep.
mss ac/jsiwek> mkdir dir
mss ac/jsiwek> echo grepForThis > grepMe
mss ac/jsiwek> echo grepForThis > grepMe2
mss ac/jsiwek> cat grepMe
grepForThis
mss ac/jsiwek> grep grepForThis *
grepMe:grepForThis
grepMe2:grepForThis
Deleting Files and Directories
The
-r option must be used for directory deletion.
mss ac/jsiwek> rm testdir/myfile
mss ac/jsiwek> rm dir
rm: cannot remove `dir': Is a directory
mss ac/jsiwek> rm -r dir
Listing File/Directory structure and count totals
To see the layout of your directories and files plus the
count total for each, you can use the "tree" command.
The "tree" command takes a directory as an argument. If
no argument is given, tree will use the current working
directory where the command is run from.
[mss ~]$ tree programs
programs
|-- c
| |-- mpi
| | |-- batch
| | | `-- tg.login-hw2-mpi.c.batch
| | |-- hw2-mpi.c
| | |-- hw2-mpi.c.orginal
| | |-- interactive
| | |-- ring-mpi.obj
| | |-- ring-mpi_topo.c
| | |-- ring-mpi_topo.o
| | `-- ring-mpi_topo.tar
| |-- serial
| | |-- batch
| | `-- interactive
| `-- test-work
| |-- a-static.out
| |-- a.out
| |-- hw2-mpi.c
| |-- titan-mpi2hw.batch
| `-- tmp
| |-- gst-tg-login-srl.batch
| |-- tnMPI-061103.e
| `-- tnMPI-061103.o
|-- fortran
| |-- mpi
| | |-- Sample_mpio.f90
| | |-- batch
| | |-- hw2-mpi.f
| | |-- hw2-mpi.f-out
...
| `-- a,out
`-- java
|-- mpi
| |-- batch
| `-- interactive
`-- serial
|-- batch
`-- interactive
24 directories, 37 files
Note: For large complex directory/file tree strucures you may want to
send the standard output to a file.
Additional Ported MSS commands
stage - copies data from MSS tape archive into MSS disk cache
mss ac/jsiwek> ls -l oldfile
-rw------- 1 jsiwek ac AR common 194560 May 13 12:48 oldfile
mss ac/jsiwek> stage oldfile
oldfile: Staging
0 files remaining.
A few minutes later:
-rw------- 1 jsiwek ac DK common 194560 May 13 12:48 oldfile
chfam - change the MSS family of files
mss ac/jsiwek> chfam
Usage: chfam family [filenames]
mss ac/jsiwek> touch newfile
mss ac/jsiwek> ls -l newfile
-rw-r--r-- 1 jsiwek ac DK common 0 Jan 9 12:30 newfile
mss ac/jsiwek> chfam unitest newfile
mss ac/jsiwek> ls -l newfile
-rw-r--r-- 1 jsiwek ac DK unitest 0 Jan 9 12:30 newfile
Transferring Files
MSS can be the source or destination of SSH based transfers, like
scp.
mss ac/jsiwek> echo "xxxxx" > file
mss ac/jsiwek> scp file jsiwek@cobalt.ncsa.uiuc.edu:~
jsiwek@cobalt.ncsa.uiuc.edu's password:
file 100% 6 0.0KB/s 0.0KB/s 00:00
mss ac/jsiwek> rm file
mss ac/jsiwek> scp jsiwek@cobalt.ncsa.uiuc.edu:~/file .
jsiwek@cobalt.ncsa.uiuc.edu's password:
file 100% 6 0.0KB/s 0.0KB/s 00:00
mss ac/jsiwek> cat file
xxxxx
5.2 Sample FTP Session
The following sample FTP session with MSS uses a subset of common FTP and
MSSFTP commands.
Connecting
To initiate an FTP session to MSS from an NCSA UNIX production machine,
a user enters the
mssftp command at the prompt.
[jsiwek@co-login ~]$ mssftp
220 mss.ncsa.uiuc.edu GridFTP Server 3.11 (gcc64dbgpthr, 1213742010-78) [Globus Toolkit 4.2.0] ready.
230-User jsiwek logged in.
ftp>
Getting help on a command
More information on a command is available by specifying the command as
an option to help:
ftp> help rm
Removes the remote file system object(s).
Usage: rm [-r] object1 [object1...objectn]
-r Recursively remove the given directory.
Note: Commands only operate recursively in subdirectories if -r option is given and available.
Displaying the Directory
The example below displays the current MSS directory
path by entering the pwd command and then displays the contents
of the working directory by entering the ls command.
ftp> pwd
/UROOT/u/ac/jsiwek
ftp> ls
drwxr-x--- 6 jsiwek ac DK common 80 Jan 7 16:49 .
drwxr-xr-x 11621 unitree ac DK common 262144 Jan 8 16:19 ..
drwx------ 2 jsiwek ac DK common 24 Dec 18 11:58 .ssh
-rw-r--r-- 1 jsiwek ac DK common 778137600 Dec 30 09:43 test.tar
drwx------ 2 jsiwek ac DK common 16384 Jan 8 13:07 testdir
Transferring Files
The user enters the FTP put
command to transfer the file myfile into MSS. Next the
chmod command is used to change the permissions of the file to
user read/writable, group readable, and world inaccesible (640). Finally, the group
is changed with chgrp.
ftp> put myfile
ls myfile: 8 bytes in 1.178463 Seconds (6.789 B/s)
ftp> ls myfile
-rw-r--r-- 1 jsiwek ac DK common 8 Jan 9 11:11 myfile
ftp> chmod 640 myfile
ftp> ls myfile
-rw-r----- 1 jsiwek ac DK common 8 Jan 9 11:11 myfile
ftp> chgrp tornado myfile
ftp> ls myfile
-rw-r----- 1 jsiwek tornado DK common 8 Jan 9 11:11 myfile
For creating and adding users to MSS groups, contact
help@ncsa.uiuc.edu.
Deleting Files
Within mssftp, delete files with the rm command. End the FTP session with MSS by entering quit.
ftp> rm myfile
ftp> ls myfile
No match for myfile
ftp> quit
221 Goodbye.
Note: In kerberized FTP, rm expands to rmdir which only deletes directories.
To delete files, use delete.
6. Additional Resources