Staff Directory | Intranet | Search

Data Transfer Example Database


Selected Category Information

MSS Transfers

Transfers to or from the NCSA Mass Storage System (MSS). MSS is a tape archival permanent data storage facility. A disk cache is used to stage data from tapes which is then available for download. For more details about MSS visit the MSS homepage. To browse example MSS transfers, stay here.

Connectivity of each cluster into Mass Storage System (MSS) varies. In general, multiple transfer streams
will achieve the best aggregate transfer rates. To maximize the efficiency of each
individual transfer use the following guidelines:

  • Always specify the "active" option when *writing* to MSS.
  • Gets are more forgiving of the FTP "MODE" used. So when retriving data from MSS more options are availiable that won't severly impact performace.
  • Be mindful of the archive retrieval process. A "get" will request the staging of a file, if it resides on the tape archive, then return without retrieving the file. A subsequent "get" command must be issued after tape retrieval is completed.
  • Avoid globus-url-copy; especially when writing.
  • Use msscmd or mssftp or uberftp whenever possible.

Third Party Transfers

Transfers that use a client to initiate server to server communication.

  • Third party transfers utilize the dedicated file transfer resources and are generally faster.
  • Third party transfers use dedicated resources.
  • Will not impact performace of login node.
  • Can be issued from any host with a proxy.

globus-url-copy Transfers

Transfers using the globus-url-copy transfer client. globus-url-copy is a command line client provided by the Globus project that will transfer data using different protocols when providesd with a URL description of the source and destination files.

  • Use the "-vb" option to see performance statistics and source/destination recap.
  • Providing a appropriate value to the "-tcp-bs option will ensure efficient buffer usage and can often have a dramatic effect on performance.

uberftp Transfers

Transfers using the uberftp grid-enabled FTP client.

  • All uberftp examples use the command line syntax in which the commands are specified as a semi-colon separated list. In all cases specifiing thes commands in the same sequence in an interactive uberftp will achieve the same result.

ssh/scp Transfers

Utilizing the PSC developed HPN-ssh enhancments to the openssh protocol.

  • The following transfers can performed from a Linux workstation outside of the NCSA domain. A valid NCSA issued Kerberos ticket can be obtained by running kinit thus enabling secure passwordless access to NCSA HPC resources.
  • To enable passwordless ssh/scp transfers between multiple TeraGrid and/or NCSA systems, valid GSI credentials can be used.

GridFTP Transfers

Transfers that use a GridFTP server installation at one or both endpoints.

Recursive Directory Transfers

Moving the entire contents and directory structure below a top-level (specified) directory.

TeraGrid Transfers

Transfers that take place between sytems on the TeraGrid network using software availaible at all TeraGrid sites.

msscmd/mssftp Transfers

Tranfers to NCSA Mass Storage System from production platforms using msscmd/mssftp.

tgcp Transfers

Trasnfers on TeraGrid using the tgcp> command.

RFT Transfers

Transfers using the Globus Reliable File Transfer web service.



Example Listing

MSS -> Hg

Transfer to MSS from NCSA IA-64 (Mercury) using uberftp.

> uberftp mss.ncsa.teragrid.org "active; put 2GB"
UNIX Archive FTP server (DiskXtender Version 2 .9) active. Checking DiskXtender.conf

220  ;UNIX Archive FTP server ready.
230 User&nbs p;dadams logged in.
Active mode
Transfer o f 2147287040 bytes completed in 48.30 seconds.  44457.07 KB/sec

 


MSS -> Hg (3rd Party)

Transfer to MSS from NCSA IA-64 (Mercury) using uberftp and third party.

> uberftp mss.ncsa.teragrid.org "lopen gridftp-hg.ncsa.teragrid.org; lcd /gpfs _scratch1/dadams; active; put 2GB"
UNIX Archive FTP server (DiskXtender Version 2 .9) active. Checking DiskXtender.conf

220  ;UNIX Archive FTP server ready.
230 User&nbs p;dadams logged in.
220 tg-s039.ncsa.teragrid.org  ;GridFTP Server 2.1 (gcc64dbg, 1122653280-63) ready .
230 User dadams logged in.
250 CWD& nbsp;command successful.
Active mode
Transfer o f 2147287040 bytes completed in 39.49 seconds.  54374.96 KB/sec

 


tg-login.uc -> MSS

Third party transfer to MSS from remote TG site.

  • Note the increased tcp buffer size setting.
> uberftp mss.ncsa.teragrid.org "lopen tg-gridftp.uc.teragrid.org; active; mod e stream; tcpbuf 4000000; lcd /disks/scratchgpfs1/adams; put 2GB"
UNIX Archive FTP server (DiskXtender Version 2 .9) active. Checking DiskXtender.conf

220  ;UNIX Archive FTP server ready.
230 User&nbs p;dadams logged in.
220 tg-s003.uc.teragrid.org G ridFTP Server 2.1 (gcc32dbg, 1122653280-63) ready.< br /> 230 User adams logged in.
Active mode
Stream mode
TCP buffer set to 400000 0 bytes
250 CWD command successful.
Transf er of 2147287040 bytes completed in 294.12&nbs p;seconds. 7300.82 KB/sec

 


MSS (AR) -> Co (guc)

Attempt to transfer an archived file via globus-url-copy

  • globus-url-copy returns an error if the file is archived.
  • Archive retrieval is initiated by the (failed) request for the file.
  • A workaround for this would be to loop until no error is returned, or use uberftp with 'quote wait'.
  • The "-rp" or relative path option allows the paths to be specified relative to $HOME on either system. %2F in the path forces root directory.
> globus-url-copy -nodcau -tcp-bs 65536 -rp gsiftp://mss.ncsa.teragrid.org/1GB -2 gsiftp://gridftp-co.ncsa.teragrid.org/%2F/scratch/users/jdoe/
error: globus_ftp_client: the server responded with  an error
550 1GB-2: is being retrieved  from the archive...

 


MSS -> Co (guc)

third party transfer from MSS to a remote TG site using globus-url-copy

> time globus-url-copy -nodcau -tcp-bs 65536 -rp gsiftp://mss.ncsa.teragrid.or g/2GB gsiftp://gridftp-co.ncsa.teragrid.org/%2F/scratch/users/dadams/
real    1m13.367s
user    0m0 .197s
sys     0m0.016s

 


guc large file

Third party transfer of a large file from NCSA to SDSC.

  • The stripe option is used here to utilize all of the servers at each end.
  • Large files benefit the most from striping.
  • A 10 MB tcp window size is used (tcp-bs) to keep the pipe full between SDSC and NCSA (~60ms RTT).
> globus-url-copy -vb -tcp-bs 10000000 -stripe gsiftp://gridftp-hg.ncsa.teragr id.org/gpfs_scratch1/nopurge/dadams/data/10GB/10GB-1 gsiftp://tg-gridftp.sds c.teragrid.org/gpfs/dadams/
Source: gsiftp://gridftp-hg.ncsa.teragrid.org/gpfs_scratch1/nopurge/dad ams/data/10GB/
Dest:   gsiftp://tg-gridftp.sdsc.teragr id.org/gpfs/dadams/
  10GB-1
  10737418240  bytes       567.29 MB/sec  ;avg       634.24 MB/sec inst

 


MSS (AR) -> Hg

Retrival of an archived file from MSS using uberftp with the "quote wait" command.

  • Note the use of the "quote wait" command. This instructs the client to wait for the file(s) to be staged from the tape archive and then carry out the transfer.
> uberftp mss.ncsa.teragrid.org "lopen gridftp-hg.ncsa.teragrid.org; lcd /gpfs _scratch1/dadams; active; mode stream; quote wait; get 1GB-2"
UNIX Archive FTP server (DiskXtender Version 2 .9) active. Checking DiskXtender.conf

220  ;UNIX Archive FTP server ready.
230 User&nbs p;dadams logged in.
220 tg-s037.ncsa.teragrid.org  ;GridFTP Server 2.1 (gcc64dbg, 1122653280-63) ready .
230 User dadams logged in.
250 CWD& nbsp;command successful.
Active mode
Stream mod e
258 WAIT on
Transfer of 1073741824  bytes completed in 128.07 seconds. 8384.30 KB/ sec

 


HPN-scp None cipher

Transfer using HPN-enabled sshd server ans scp client.

> > ~/local/bin/scp -P 222 -oNoneEnabled=yes -oNoneSwitch=yes /gpfs_scratch1/j doe/1GB co-login1.ncsa.uiuc.edu:/scratch/users/jdoe/1GB
WARNING: ENAB LED NONE CIPHER
1GB
WARNING: ENABLED NONE CIPHER
1GB   &nbs p;            &n bsp;                         &nbs p;            &n bsp;                         &nbs p;            &n bsp;                         &nbs p;            &n bsp;                         &nbs p;            &n bsp;    100% 1024MB  25.0MB/s    00:41

 


MSS tarfile

Make tarfile of current directory and send to MSS.

  • Tar file is only created on MSS.
> msscmd "mkdir test-tar, cd test-tar, tar cvf test.tar ."
Using binary mode to transfer files.
Using&n bsp;65536 byte TCP window size.
./
./d1/ ./d1/m_000001
./d2/
./d2/m_000001
./d3/
./ d3/m_000001

 


msscmd: list tar

List contents of tar file on MSS using msscmd.

  • Tar archive exists only on MSS.
> msscmd "cd test-tar, tar tvf test.tar"
Using binary mode to transfer files.
Using&n bsp;65536 byte TCP window size.
drwxr-x--- d adams/aaa        0 2006-09-28&n bsp;11:37:38 ./
drwxr-x--- dadams/aaa   &nbs p;    0 2006-09-28 11:47:37 ./d1/
-rw-r----- dadams/aaa 10485760 2006-09-28 11:37:28  ./d1/m_000001
drwxr-x--- dadams/aaa      ;   0 2006-09-28 11:47:45 ./d2/
-rw-r-- --- dadams/aaa 10485760 2006-09-28 11:37:32 ./d2/m_ 000001
drwxr-x--- dadams/aaa         0 2006-09-28 11:47:49 ./d3/
-rw-r-----&nbs p;dadams/aaa 10485760 2006-09-28 11:37:40 ./d3/m_000001

 


tgcp large file

Transfer of a large file from NCSA to SDSC using the TeraGrid copy (tgcp) tool.

  • Passing the -big option to tgcp tells it to employ striping.
  • The URL sytax is more relaxed here. Basically, you get the third party benifits without all the typing.
  • Notice that the full globus-url-copy comand is printed. This is a good way to develop globus-url-copy commands.
> tgcp -vb -big ./10GB-1 tg-gridftp.sdsc.teragrid.org:/gpfs/dadams/
/usr/local/globus-4.0.1-r3//bin/globus-url-copy -vb  -stripe& nbsp;-p 4 -tcp-bs 4000000 gsiftp://gridftp-hg.ncsa.terag rid.org/gpfs_scratch1/nopurge/dadams/data/10GB/10GB-1 gsiftp://tg-gridf tp.sdsc.teragrid.org/gpfs/dadams/

Source: gsiftp://gridf tp-hg.ncsa.teragrid.org/gpfs_scratch1/nopurge/dadams/data/10GB/
Dest:    gsiftp://tg-gridftp.sdsc.teragrid.org/gpfs/dadams/
  10GB-1
  10737418240 bytes  &nbs p;    582.65 MB/sec avg     ;   918.36 MB/sec inst

 


tgcp RFT

Transfer a single file using the RFT transfee management system.

  • RFT functionality is implemented with the -rft option.
  • tgcp automatically generates an input file for the rft command then executed the rft command.
  • Using the -v option, the file name of the temporary input file for rft is displayed, and it's contents can be inspected.
> tgcp -v -rft 1GB tg-gridftp.sdsc.teragrid.org:/gpfs/dadams/
/usr/local/globus-4.0.1-r3/bin/rft -h rft-hg.ncsa.teragrid.org&nbs p;-r 8443 -l 60 -z host -f /tmp/filelTzYM v

Number of transfers in this reques t: 1
Subscribed for overall status
Termina tion time to set: 60 minutes

 O verall status of transfer:
Finished/Active/Failed/Retr ying/Pending
0/1/0/0/0

 Overall status  of transfer:
Finished/Active/Failed/Retrying/Pending
1/0/ 0/0/0
All Transfers are completed

 


Client -> Server

Pushing data from client process to waiting GridFTP server.

  • The "-vb" option is included to show performance data.
> globus-url-copy -vb file:///gpfs_scratch1/nopurge/dadams/1GB gsiftp://gridft p-co.ncsa.teragrid.org/scratch/users/dadams/1GB
Source: file:///gpfs_scratch1/nopurge/dadams/
Dest:  & nbsp;gsiftp://gridftp-co.ncsa.teragrid.org/scratch/users/dadams/
&nbs p; 1GB
   1045430272 bytes         28.58 MB/sec avg   &n bsp;    30.00 MB/sec inst

 


guc: Client -> Server

Pushing data from client process to waiting GridFTP server

  • The parallel streams parameter is adjusted here to claim more "fair shares" of the network.
> globus-url-copy -vb -p 8 file:///gpfs_scratch1/nopurge/dadams/1GB gsiftp://g ridftp-co.ncsa.teragrid.org/scratch/users/dadams/1GB
Source: file:///gpfs_scratch1/nopurge/dadams/
Dest:  & nbsp;gsiftp://gridftp-co.ncsa.teragrid.org/scratch/users/dadams/
&nbs p; 1GB
   1073741824 bytes         72.62 MB/sec avg   &n bsp;    70.49 MB/sec inst

 


guc: whole directory

Copy an entire directory hierarchy to another location.

  • The "fast" option tells the GridFTP servers to always use Mode E to complete transfers. This, among other things enables the reuse of the control channel connection reducing the overhead of each individual file transfer.
  • Adding the "-r>" and "-cd" options ensures that directories are recursively copied and created when needed.
  • The trailing slashes on the source and destination URLs indicate a directory is being copied rather than a file.
> globus-url-copy -vb -tcp-bs 8388608 -fast -r -cd gsiftp://gridftp-hg.ncsa.te ragrid.org/gpfs_scratch1/nopurge/dadams/data/dfiles/ gsiftp://tg-gridftp.sds c.teragrid.org/gpfs/dadams/dfiles/
Source: gsiftp://gridftp-hg.ncsa.teragrid.org/gpfs_scratch1/nopurge/dad ams/data/dfiles/
Dest:   gsiftp://tg-gridftp.sdsc.tera grid.org/gpfs/dadams/dfiles/
  d1/

 &nb sp;d2/

  d3/

Source: gsiftp://g ridftp-hg.ncsa.teragrid.org/gpfs_scratch1/nopurge/dadams/data/dfiles/d1/
Dest:   gsiftp://tg-gridftp.sdsc.teragrid.org/gpfs/dadams /dfiles/d1/
  m_000001
      ;10485760 bytes        16.67&nb sp;MB/sec avg        16.67  ;MB/sec inst
Source: gsiftp://gridftp-hg.ncsa.teragrid.org/ gpfs_scratch1/nopurge/dadams/data/dfiles/d2/
Dest:   g siftp://tg-gridftp.sdsc.teragrid.org/gpfs/dadams/dfiles/d2/
 &nb sp;m_000001

Source: gsiftp://gridftp-hg.ncsa.teragrid.or g/gpfs_scratch1/nopurge/dadams/data/dfiles/d3/
Dest:    ;gsiftp://tg-gridftp.sdsc.teragrid.org/gpfs/dadams/dfiles/d3/
 & nbsp;m_000001

 


SSH Streaming Tar

Copy a local directory structure via streaming tar onto NCSA TeraGrid.

> tar -cf - tst/ | ssh user@tg-login4.ncsa.teragrid.org "tar xf -"
NULL

 


SSH Streaming to Tar

Copy a local directory into a tarball on Tungsten cluster.

> tar -cf - tst/ | ssh user@tuna.ncsa.uiuc.edu "cat > tst.tar"
NULL

 


MSS with uberftp

Transfer a file from MSS using the uberftp client.

  • Note the use of the "quote wait" command. This tells uberftp to wait for the tape retrieval process to complete before attempting to access the file.
> uberftp mss.ncsa.teragrid.org "lopen gridftp-w.ncsa.teragrid.org; lcd /cfs/s cratch/users/arnoldg; cd junk; quot wait; get new.tar"
220 UNIX Archive FTP server ready.
230  User arnoldg logged in.
220 tunf001.ncsa.uiuc.edu  GridFTP Server 2.1 (gcc32dbg, 1122653280-63)  ready.
230 User arnoldg logged in.
WAIT&nb sp;is now turned on.
new.tar:  24948008960&n bsp;bytes in 1433.74 seconds. 17400.62 KB/sec