User Guide to the NCSA UniTree Archival System
Contents
Introduction and Background
UniTree Overview
UniTree Interfaces
General FTP Commands
Additional UniTree FTP Commands
Using Unitree
Using UniTree Effectively
UniTree Families
Tape to Disk Restoring
Redundant File Copies
Saving Files When Renaming
Sample FTP Session
UniTree Overview
UniTree is an archival storage system that supports files of unlimited size in
a UNIX-like file system environment. It manages its disk space by
automatically migrating files off disk onto tape when
space becomes low or if files have not been accessed in a certain length of time.
If you need to retrieve a file that has been migrated to tape, the system will
automatically restore it to UniTree disk before it is copied to
your local system.
UniTree differs from a standard UNIX file system in one significant way. When
a file is deleted off UniTree, the file is actually moved into a
.trash directory in your UniTree home directory (i.e.,
~username/.trash). The filename has a date
and time stamp appended to it when it is moved. The file then resides in
this directory for a number of minutes (called the timeout
interval), after which it is permanently removed from the UniTree file
system. This mechanism allows you to recover accidentally deleted files.
UniTree uses NCSA's Kerberos password. This is the same password used for all
of your accounts on NCSA production systems.
UniTree Interfaces
NCSA supports three FTP interfaces when using UniTree: Kerberos ftp, mssftp,
and msscmd.
FTP
If you want to access UniTree directly from a machine outside of NCSA, you
will need to use Kerberos FTP.
NCSA's Security
page has information
on installing Kerberos on your local machine.
Binaries are available for various UNIX platforms, Windows, and Macintosh.
mssftp and
msscmd (described below) are strongly recommended for accessing UniTree
from NCSA UNIX production machines since they use a high-speed network,
and therefore provide much faster access than normal ftp.
mssftp
mssftp is available on NCSA UNIX production machines. mssftp will
automatically connect you to UniTree without the need to enter a login name or
password as with FTP.
Enter man mssftp for additional information.
Note 1: By default,
prompt is
OFF in mssftp.
Note 2: By default,
wait is
OFF in mssftp. This
means that large files that are on archive could time out before they make
it to local disk. msscmd (described below) is recommended in these cases;
alternatively,
wait can be turned on as follows:
ftp> quote wait
msscmd
msscmd is a command line interface to mssftp, and is available on NCSA
UNIX production machines. It can be used both interactively and in batch
scripts.
Enter man msscmd for additional information.
There is
also a built-in tar command available. The msscmd man page has
syntax information and examples on this command.
General FTP Commands
Use standard FTP commands such as get, put,
mget, mput, ls, dir, and
rename to manipulate your files locally or remotely.
Enter man ftp for more information on ftp. Also see
"Additional UniTree
FTP Commands" for additional commands that are supported by the UniTree
FTP server.
There is also a help utility available that lists the commands
available:
ftp> help
Commands may be abbreviated. Commands are:
! delete mdelete proxy runique
$ debug mdir sendport send
account dir mget put site
append disconnect mkdir pwd size
ascii form mls quit status
bell get mode quote struct
binary glob modtime recv system
bye hash mput reget sunique
case help newer rstatus tenex
cd idle nmap rhelp trace
cdup image nlist rename type
chmod lcd ntrans reset umask
close ls open restart verbose
cr macdef prompt rmdir ?
ftp>
More information on a command is available by specifying the command as
an option to help:
ftp> help mdelete
mdelete delete multiple files
ftp>
Note: mget and mput only operate on files witin a directory. They do not work recursively in subdirectories.
Additional UniTree FTP Commands
In order to increase the functionality of the FTP interface into the UniTree
storage system, several additional commands are supported by the UniTree FTP
server. These commands allow for staging files that were migrated to tape,
changing the permissions of files, creating symbolic links, displaying and
setting the trash can timeout interval, and displaying and setting the number
of duplicate copies of files to be stored within UniTree. These commands are
passed to UniTree using the FTP quote command. These special
UniTree commands are case insensitive.
The following is a brief listing of the commands and their syntax:
| Command |
Entry |
Function |
| gtrsh |
quote gtrsh |
displays the trashcan
timeout interval in minutes (initial default 300 mins) |
| strsh |
quote strsh time
|
changes trashcan timeout
interval |
| chgrp |
quote chgrp groupname
filename |
changes the group associated
with the local
file |
| mchgrp |
quote mchgrp groupname
<wildcard> |
changes the group associated
with local files using wildcards |
| chmod |
quote chmod permissions
filename |
changes the permissions for
a local file |
| mchmod |
quote mchmod permissions
<wildcard> |
changes file permissions
using wildcards |
| stage |
quote stage wait_time
filename |
stages (caches) a file from
tape to disk |
| mstage |
quote mstage wait_time
filename(s) |
stages (caches) a file or
list of files from tape to disk |
| wait |
quote wait |
toggles on/off waiting for a
file to be cached from tape to disk when using the FTP get command |
| nmmdup |
quote nmdup n
|
determines how many
duplicates of a file (n) are stored in UniTree; Maximum is 2 copies (NCSA
system default is 2) |
| ln |
quote ln file1 file2
|
creates a symbolic link
|
| direct |
quote direct |
read files directly from the
tape system without being staged to disk cache. Default is off. |
| allo |
quote allo filesize
|
specify the size of
the next file (in bytes) that will be put. This improves the efficiency
of binary puts. |
See the section DiskXtender FTP Commands in the
DiskXtender User Guide
for more information on these additional UniTree FTP commands.
dir Additional Fields
Two additional fields are displayed when using the FTP dir
command.
-rw-r--r-- 1 6076 423 DK common 1000000 May 12 13:56 myfile
The first extra field displays the value AR or DK for each
file and directory:
- AR (archive)
- File or directory is archived to tape and currently resides there.
- DK (disk)
- File or directory currently resides on UniTree's disk. The file may also
reside on tape, but UniTree uses the disk copy for performance reasons.
The additional field displayed after DK or AR in the directory specifies to
what "family" the current listed file belongs. More information on UniTree Families is presented below.
Using UniTree
Using tar Files
If you frequently store many small- to medium-sized files in UniTree,
create UNIX tar files before saving to UniTree. (When you "tar" files, you
create a single file that is a collection of other files.) For information
about the tar command, enter man tar.
For example, if you have a directory with 50 files that are frequently
accessed together, retrieving a single tar file holding all 50 files is
significantly faster than retrieving each file individually. If stored
individually, the files could potentially reside on 50 different tapes.
When each of these 50 files is accessed again, you would
have to wait for the tapes to be mounted and the files cached to
UniTree disk before they are copied to your local disk. If the 50 files
are combined into a single tar file that resides on tape, only one tape
mount request is generated and only one tape is read to retrieve your files.
Important Note: Do not use tar when the resulting file is
> 50 Gbytes
and you need to access the file
interactively. In this case, retrieval can take longer than the interactive
limits.
Preserving File Creation Information
The current FTP protocol standard does not support saving files with the remote
site's creation and modification times so this information will be lost when
the files are retrieved from UniTree. When a file is retrieved its access and
creation times are timestamped with the time of retrieval.
One way to preserve files' creation times and other file attributes
(such as permissions) is to tar the files. The file attributes of
the tar file will be lost, but the attributes of the files within the tarball
will be preserved.
Tape to Disk Restoring
When a file needs to be retrieved from UniTree tape, it is normally first copied to UniTree's disk and then transferred to the remote host. (See dir Additional Fields for information on how to determine if your file is on tape or disk) To make room for files being read from tape, old files on the UniTree disk are purged.
During heavy usage times when there are many pending tape mount requests,
accessing a file that has been migrated to tape could take several minutes. If
the process is taking too long and you want to interrupt, use your interrupt
key (usually CONTROL-C). UniTree will continue to copy your file
to its disk even after you interrupt the process. To see if the file
has been copied to UniTree's disk, use the dir command in FTP or
try accessing the file again and see if you are put into the wait state. You
can also prestage the files (described below).
If the file is large and will not be read again it is possible to bypasses the transfer to disk with the command quote direct.
Improving Access Times Using Prestaging
UniTree can prestage files that have been archived to tape. To copy the files
to UniTree's disk for near-future access, use the ftp stage or
mstage command (see Additional UniTree FTP
Commands for more details). stage and mstage
copy your files to UniTree disk without actually transfering them to the
local filesystem so that when you later attempt to retrieve your files, they
can be transferred directly from UniTree disk.
For example, if you have files file1, file2, and
file3 in your home directory in UniTree and they all reside on
tape, you can copy them to UniTree disk by entering the following three
commands within UniTree with a zero wait time:
ftp> quote site stage 0 file1
ftp> quote site stage 0 file2
ftp> quote site stage 0 file3
The commands above cannot be combined on one line.
Alternately you can execute the command:
ftp> quote site mstage 0 file1 file2 file3
or
ftp> quote site mstage 0 file[1-3]
There is no penalty for using the stage and mstage
command on files that already exist on disk. The command returns the message
"UniTree command successful." If the file exists solely on tape and you use
the staging commands on this file, the message "File is being retrieved from
the archive" is displayed.
HINT: Only stage the files that have a near-immediate need. Prestaging
all your files could purge other files on disk that you wanted to
use. Or, if you prestage your files too far in advance, they could be purged
from disk before you have a chance to access them.
stage in Batch Scripts
Use the staging commands at the beginning of your batch scripts for all the
files that you are going to be retrieving during that batch session. When your
batch script attempts to retrieve these files later, they may have already been
copied to disk and be ready for immediate transfer (no waiting for a tape
mount).
UniTree Families
If you always access a large number of files at the same time, it
might be useful to have all the files belong to the same family.
All files that belong to a family reside on a common set of tapes,
so there are fewer tape mounts required to access the files.
An approximate rule of thumb is that if you save more than 500 files every month it may be worth asking for a family. Users can request a
family to be assigned to them by sending email to
help@ncsa.uiuc.edu.
Redundant File Copies
The UniTree archival system offers you extra file protection when archiving
your files. By default at NCSA, files are stored into the archival system
redundantly (i.e., two copies in different locations). The maximum redundancy
available at NCSA is two.
Saving Files When Renaming
UniTree saves the older name of a file in your .trash
directory when you use the rename command. This filename is a
link and is deleted when your trash can timeout limit expires.
Sample FTP Session
The following sample FTP session with UniTree uses a subset of common FTP and
FTP UniTree commands.
Connecting
To initiate an FTP session to UniTree from an NCSA UNIX production machine,
a user named mccoy enters the
mssftp command at the prompt.
cu12 % mssftp
Connecting to host mss-cu.ncsa.uiuc.edu...Connected
220-
UNIX Archive FTP server (DiskXtender Version 2.6) active.
Checking DiskXtender.conf
220 UNIX Archive FTP server ready.
Doing big TCP windows with files larger than 2097152 bytes.
230 User mccoy logged in.
200 Type set to I.
Using binary mode to transfer files.
ftp>
Displaying the Directory
In the example below, mccoy displays the current UniTree directory
path by entering the pwd command and then displays the contents
of the working directory by entering the dir command.
ftp> pwd
257 "/u/ac/mccoy" is current directory.
ftp> dir
200 PORT command successful.
150 Opening ASCII mode data connection for
/usr/local/unitree/162/bin/ddir (0 bytes).
drwxr-xr-x 2 6076 423 DK common 8192 Apr 12 10:42 .trash
-rw-r--r-- 1 6076 423 AR common 1000000 May 12 13:56 myfile
226 Transfer complete.
76 bytes received in 0.16 seconds (0.47 Kbytes/s)
HINT: Don't break out of dir or ls commands
using CONTROL-C. Your ftp client will probably "hang" for several
minutes if you do.
Trash Directory
The UniTree gtrsh command checks the length of time of the
trash can interval. This value tells how long a file remains in the
.trash directory before it is removed permanently from the UniTree
system.
Use the FTP rename command to move the file out of the trash can
and into another directory.
ftp> quote gtrsh
250 Trashcan interval is set to 300 minutes.
ftp> quote nmdup
200 Current number of tape copies is 2
Transferring Files
The default file transfer mode is binary. User mccoy enters the FTP put
command to transfer the file myfile into UniTree. Next the
chmod command is used to change the permissions of the file to
user read/writable, group readable, and world readable (644).
ftp> put myfile
200 PORT command successful.
150 Opening BINARY mode data connection for myfile.
226 Transfer complete.
local: myfile remote: myfile
1000000 bytes sent in 1.9 seconds (5.2e+02 Kbytes/s)
ftp> quote chmod 644 myfile
250 UniTree CHMOD command successful.
Deleting Files
Delete files with the delete command (for a single file) and mdelete command (for multiple files). Display the contents of the .trash directory using the dir command. When the delete command is entered, myfile is moved to the .trash directory with a date/time stamp appended. This file remains in the .trash directory for the number of minutes reported if you enter the gtrsh command. End the FTP session with UniTree by entering quit.
ftp> delete myfile
250 UniTree DELE command successful.
ftp> dir .trash
200 PORT command successful.
150 Opening ASCII mode data connection for /usr/local/unitree/162/bin/ddir
(0 bytes)
-rw------- 1 mccoy ac DK common 48727 May 12 09:38 myfile#05-19-95#
09:38:53CDT#0063
226 Transfer complete.
remote: .trash
36 bytes received in 0.58 seconds (0.06 Kbytes/s)
ftp> quit
221 Goodbye.
Saving and Restoring Examples
The built-in tar command can also be used to save or restore a tar file from
UniTree.
See the section USING TAR in the msscmd man page for syntax
details and examples.
To save the current directory to a tar file using the built-in tar command:
% msscmd "tar cf filename.tar ."
To save the current directory to a tar file, the command would
be:
% msscmd "tar cf filename.tar ."
To restore a tar file from UniTree into the current directory:
% msscmd "tar xf filename.tar"