NCSA Home
Contact Us | Intranet | Search

ncsa

User Guide to the NCSA UniTree Archival System

Contents

Introduction and Background

UniTree Overview
UniTree Interfaces
General FTP Commands
Additional UniTree FTP Commands

Using Unitree

Using UniTree Effectively
UniTree Families
Tape to Disk Restoring
Redundant File Copies
Saving Files When Renaming
Sample FTP Session

UniTree Overview

UniTree is an archival storage system that supports files of unlimited size in a UNIX-like file system environment. It manages its disk space by automatically migrating files off disk onto tape when space becomes low or if files have not been accessed in a certain length of time. If you need to retrieve a file that has been migrated to tape, the system will automatically restore it to UniTree disk before it is copied to your local system.

UniTree differs from a standard UNIX file system in one significant way. When a file is deleted off UniTree, the file is actually moved into a .trash directory in your UniTree home directory (i.e., ~username/.trash). The filename has a date and time stamp appended to it when it is moved. The file then resides in this directory for a number of minutes (called the timeout interval), after which it is permanently removed from the UniTree file system. This mechanism allows you to recover accidentally deleted files.

UniTree uses NCSA's Kerberos password. This is the same password used for all of your accounts on NCSA production systems.

UniTree Interfaces

NCSA supports three FTP interfaces when using UniTree: Kerberos ftp, mssftp, and msscmd.

FTP

If you want to access UniTree directly from a machine outside of NCSA, you will need to use Kerberos FTP. NCSA's Security page has information on installing Kerberos on your local machine. Binaries are available for various UNIX platforms, Windows, and Macintosh.

mssftp and msscmd (described below) are strongly recommended for accessing UniTree from NCSA UNIX production machines since they use a high-speed network, and therefore provide much faster access than normal ftp.

mssftp

mssftp is available on NCSA UNIX production machines. mssftp will automatically connect you to UniTree without the need to enter a login name or password as with FTP.

Enter man mssftp for additional information.

Note 1: By default, prompt is OFF in mssftp.
Note 2: By default, wait is OFF in mssftp. This means that large files that are on archive could time out before they make it to local disk. msscmd (described below) is recommended in these cases; alternatively, wait can be turned on as follows:
ftp> quote wait

msscmd

msscmd is a command line interface to mssftp, and is available on NCSA UNIX production machines. It can be used both interactively and in batch scripts. Enter man msscmd for additional information. There is also a built-in tar command available. The msscmd man page has syntax information and examples on this command.

General FTP Commands

Use standard FTP commands such as get, put, mget, mput, ls, dir, and rename to manipulate your files locally or remotely. Enter man ftp for more information on ftp. Also see "Additional UniTree FTP Commands" for additional commands that are supported by the UniTree FTP server.

There is also a help utility available that lists the commands available:

ftp> help
Commands may be abbreviated.  Commands are:

!               delete          mdelete         proxy           runique
$               debug           mdir            sendport        send
account         dir             mget            put             site
append          disconnect      mkdir           pwd             size
ascii           form            mls             quit            status
bell            get             mode            quote           struct
binary          glob            modtime         recv            system
bye             hash            mput            reget           sunique
case            help            newer           rstatus         tenex
cd              idle            nmap            rhelp           trace
cdup            image           nlist           rename          type
chmod           lcd             ntrans          reset           umask
close           ls              open            restart         verbose
cr              macdef          prompt          rmdir           ?
ftp>

More information on a command is available by specifying the command as an option to help:

ftp> help mdelete
mdelete         delete multiple files
ftp>

Note: mget and mput only operate on files witin a directory. They do not work recursively in subdirectories.

Additional UniTree FTP Commands

In order to increase the functionality of the FTP interface into the UniTree storage system, several additional commands are supported by the UniTree FTP server. These commands allow for staging files that were migrated to tape, changing the permissions of files, creating symbolic links, displaying and setting the trash can timeout interval, and displaying and setting the number of duplicate copies of files to be stored within UniTree. These commands are passed to UniTree using the FTP quote command. These special UniTree commands are case insensitive.

The following is a brief listing of the commands and their syntax:

Command Entry Function
gtrsh quote gtrsh displays the trashcan timeout interval in minutes (initial default 300 mins)
strsh quote strsh time changes trashcan timeout interval
chgrp quote chgrp groupname filename changes the group associated with the local file
mchgrp quote mchgrp groupname <wildcard> changes the group associated with local files using wildcards
chmod quote chmod permissions filename changes the permissions for a local file
mchmod quote mchmod permissions <wildcard> changes file permissions using wildcards
stage quote stage wait_time filename stages (caches) a file from tape to disk
mstage quote mstage wait_time filename(s) stages (caches) a file or list of files from tape to disk
wait quote wait toggles on/off waiting for a file to be cached from tape to disk when using the FTP get command
nmmdup quote nmdup n determines how many duplicates of a file (n) are stored in UniTree; Maximum is 2 copies (NCSA system default is 2)
ln quote ln file1 file2 creates a symbolic link
direct quote direct read files directly from the tape system without being staged to disk cache. Default is off.
allo quote allo filesize specify the size of the next file (in bytes) that will be put. This improves the efficiency of binary puts.

See the section DiskXtender FTP Commands in the DiskXtender User Guide for more information on these additional UniTree FTP commands.

dir Additional Fields

Two additional fields are displayed when using the FTP dir command.

 -rw-r--r--  1 6076  423   DK   common 1000000 May 12 13:56 myfile

The first extra field displays the value AR or DK for each file and directory:

AR (archive)
File or directory is archived to tape and currently resides there.
DK (disk)
File or directory currently resides on UniTree's disk. The file may also reside on tape, but UniTree uses the disk copy for performance reasons.

The additional field displayed after DK or AR in the directory specifies to what "family" the current listed file belongs. More information on UniTree Families is presented below.

Using UniTree

Using tar Files

If you frequently store many small- to medium-sized files in UniTree, create UNIX tar files before saving to UniTree. (When you "tar" files, you create a single file that is a collection of other files.) For information about the tar command, enter man tar.

For example, if you have a directory with 50 files that are frequently accessed together, retrieving a single tar file holding all 50 files is significantly faster than retrieving each file individually. If stored individually, the files could potentially reside on 50 different tapes. When each of these 50 files is accessed again, you would have to wait for the tapes to be mounted and the files cached to UniTree disk before they are copied to your local disk. If the 50 files are combined into a single tar file that resides on tape, only one tape mount request is generated and only one tape is read to retrieve your files.

Important Note: Do not use tar when the resulting file is > 50 Gbytes and you need to access the file interactively. In this case, retrieval can take longer than the interactive limits.

Preserving File Creation Information

The current FTP protocol standard does not support saving files with the remote site's creation and modification times so this information will be lost when the files are retrieved from UniTree. When a file is retrieved its access and creation times are timestamped with the time of retrieval.

One way to preserve files' creation times and other file attributes (such as permissions) is to tar the files. The file attributes of the tar file will be lost, but the attributes of the files within the tarball will be preserved.

Tape to Disk Restoring

When a file needs to be retrieved from UniTree tape, it is normally first copied to UniTree's disk and then transferred to the remote host. (See dir Additional Fields for information on how to determine if your file is on tape or disk) To make room for files being read from tape, old files on the UniTree disk are purged.

During heavy usage times when there are many pending tape mount requests, accessing a file that has been migrated to tape could take several minutes. If the process is taking too long and you want to interrupt, use your interrupt key (usually CONTROL-C). UniTree will continue to copy your file to its disk even after you interrupt the process. To see if the file has been copied to UniTree's disk, use the dir command in FTP or try accessing the file again and see if you are put into the wait state. You can also prestage the files (described below).

If the file is large and will not be read again it is possible to bypasses the transfer to disk with the command quote direct.

Improving Access Times Using Prestaging

UniTree can prestage files that have been archived to tape. To copy the files to UniTree's disk for near-future access, use the ftp stage or mstage command (see Additional UniTree FTP Commands for more details). stage and mstage copy your files to UniTree disk without actually transfering them to the local filesystem so that when you later attempt to retrieve your files, they can be transferred directly from UniTree disk.

For example, if you have files file1, file2, and file3 in your home directory in UniTree and they all reside on tape, you can copy them to UniTree disk by entering the following three commands within UniTree with a zero wait time:

 ftp> quote site stage 0 file1
 ftp> quote site stage 0 file2
 ftp> quote site stage 0 file3

The commands above cannot be combined on one line.

Alternately you can execute the command:

 ftp> quote site mstage 0 file1 file2 file3
or

 ftp> quote site mstage 0 file[1-3]

There is no penalty for using the stage and mstage command on files that already exist on disk. The command returns the message "UniTree command successful." If the file exists solely on tape and you use the staging commands on this file, the message "File is being retrieved from the archive" is displayed.

HINT: Only stage the files that have a near-immediate need. Prestaging all your files could purge other files on disk that you wanted to use. Or, if you prestage your files too far in advance, they could be purged from disk before you have a chance to access them.

stage in Batch Scripts

Use the staging commands at the beginning of your batch scripts for all the files that you are going to be retrieving during that batch session. When your batch script attempts to retrieve these files later, they may have already been copied to disk and be ready for immediate transfer (no waiting for a tape mount).

UniTree Families

If you always access a large number of files at the same time, it might be useful to have all the files belong to the same family. All files that belong to a family reside on a common set of tapes, so there are fewer tape mounts required to access the files. An approximate rule of thumb is that if you save more than 500 files every month it may be worth asking for a family. Users can request a family to be assigned to them by sending email to help@ncsa.uiuc.edu.

Redundant File Copies

The UniTree archival system offers you extra file protection when archiving your files. By default at NCSA, files are stored into the archival system redundantly (i.e., two copies in different locations). The maximum redundancy available at NCSA is two.

Saving Files When Renaming

UniTree saves the older name of a file in your .trash directory when you use the rename command. This filename is a link and is deleted when your trash can timeout limit expires.

Sample FTP Session

The following sample FTP session with UniTree uses a subset of common FTP and FTP UniTree commands.

Connecting

To initiate an FTP session to UniTree from an NCSA UNIX production machine, a user named mccoy enters the mssftp command at the prompt.

cu12 % mssftp
Connecting to host mss-cu.ncsa.uiuc.edu...Connected
220- 
UNIX Archive FTP server (DiskXtender Version 2.6) active. 
Checking DiskXtender.conf
 
220 UNIX Archive FTP server ready.
Doing big TCP windows with files larger than 2097152 bytes.
230 User mccoy logged in.
200 Type set to I.
Using binary mode to transfer files.
ftp> 

Displaying the Directory

In the example below, mccoy displays the current UniTree directory path by entering the pwd command and then displays the contents of the working directory by entering the dir command.

 ftp> pwd
 257 "/u/ac/mccoy" is current directory. 
 ftp> dir
 200 PORT command successful.
 150 Opening ASCII mode data connection for 
 /usr/local/unitree/162/bin/ddir (0 bytes).
 drwxr-xr-x  2 6076  423   DK   common    8192 Apr 12 10:42 .trash
 -rw-r--r--  1 6076  423   AR   common 1000000 May 12 13:56 myfile
 226 Transfer complete.
 76 bytes received in 0.16 seconds (0.47 Kbytes/s)

HINT: Don't break out of dir or ls commands using CONTROL-C. Your ftp client will probably "hang" for several minutes if you do.

Trash Directory

The UniTree gtrsh command checks the length of time of the trash can interval. This value tells how long a file remains in the .trash directory before it is removed permanently from the UniTree system.

Use the FTP rename command to move the file out of the trash can and into another directory.

 ftp> quote gtrsh
 250 Trashcan interval is set to 300 minutes.
 ftp> quote nmdup
 200 Current number of tape copies is 2

Transferring Files

The default file transfer mode is binary. User mccoy enters the FTP put command to transfer the file myfile into UniTree. Next the chmod command is used to change the permissions of the file to user read/writable, group readable, and world readable (644).

 ftp> put myfile
 200 PORT command successful.
 150 Opening BINARY mode data connection for myfile.
 226 Transfer complete.
 local: myfile remote: myfile
 1000000 bytes sent in 1.9 seconds (5.2e+02 Kbytes/s)
 ftp> quote chmod 644 myfile
 250 UniTree CHMOD command successful.

Deleting Files

Delete files with the delete command (for a single file) and mdelete command (for multiple files). Display the contents of the .trash directory using the dir command. When the delete command is entered, myfile is moved to the .trash directory with a date/time stamp appended. This file remains in the .trash directory for the number of minutes reported if you enter the gtrsh command. End the FTP session with UniTree by entering quit.

 ftp> delete myfile
 250 UniTree DELE command successful.
 ftp> dir .trash
 200 PORT command successful.
 150 Opening ASCII mode data connection for /usr/local/unitree/162/bin/ddir 
(0 bytes)
 -rw-------  1 mccoy ac   DK  common    48727 May 12 09:38 myfile#05-19-95#
09:38:53CDT#0063
 226 Transfer complete.
 remote: .trash
 36 bytes received in 0.58 seconds (0.06 Kbytes/s)
 ftp> quit
 221 Goodbye.

Saving and Restoring Examples

The built-in tar command can also be used to save or restore a tar file from UniTree. See the section USING TAR in the msscmd man page for syntax details and examples.

To save the current directory to a tar file using the built-in tar command:

 % msscmd "tar cf filename.tar ." 

To save the current directory to a tar file, the command would be:

 % msscmd "tar cf filename.tar ." 

To restore a tar file from UniTree into the current directory:

 % msscmd "tar xf filename.tar"