Grid-in-a-Box Test Script

The Grid-in-a-Box test script is a simple perl script that tests the setup of Grid-enabled Linux clusters for the Grid in a Box project. The script is available here.

Usage

usage: gib-setup-test [-h htmlfile] [-k] [-t secs] [-m hostname] [-g hostname] [-b mdsbasedn] site1 [site2 ...]
       where -h writes html results to htmlfile
             -t specifies a timeout for child processes
             -m specifies the hostname of a gsiftp server
             -g specifies the hostname of the GIIS
	     -b specifies the GIIS MDS branch-point
             site1 [site2 ...] is a list of sites to test

Prerequisites

You must have a valid proxy credential to run the gib-setup-test script. Run grid-proxy-init to obtain a proxy credential.

Test Definitions

The script performs the following tests.
GIIS
This test verifies that the remote machine is advertising its jobmanagers to the Grid Index Information Service for your virtual organization (VO). By default it queries the GiB GIIS (mds-gib.ncsa.uiuc.edu) for each machine. An alternative GIIS and VO can be specified with the -g and -b options. The script runs grid-info-search -h mds-gib.ncsa.uiuc.edu -x -b 'mds-vo-name=mds-gib,o=grid' to perform this test.
Gatekeeper
This test verifies that you can successfully authenticate to the gatekeeper on the remote machine. Authenticating to the gatekeeper is a prerequisite to running jobs via the jobmanager(s). If you fail to authenticate to the gatekeeper, there is a good chance you won't be able to authenticate to any Grid services on this machine. This test may fail for a number of reasons: The script runs globusrun -a -r hostname to perform this test.
GRIS
This test performs a test query against the remote machine's Grid Resource Information Service. The script runs grid-info-search -h $site -x to perform this test.
GSIFTP
This test verifies that the GSI-enabled FTP service is available on the remote machine. The script tests the service with the following commands:
GSISSH
This test verifies that you can login and run commands via the GSISSH service on the remote machine. This test requires a GSISSH client named gsissh to be in your shell's search path. The script runs gsissh -o "BatchMode yes" $site /bin/echo "GBOX TEST" to perform this test.
Condor-G
This test verifies that Condor-G is installed and running on the remote machine by running condor_q on the remote machine and examining the output.
Simple Job Fork Jobmanager
This test verifies that you can run a simple job on the remote machine via the fork jobmanager. The script runs globusrun -o -r hostname/jobmanager '&(executable="/bin/echo")(arguments="GBOX TEST")' to perform this test.
MPICH Job Fork Jobmanager
This test verifies that you can run a simple mpich job using 2 CPUs on the remote machine via the fork jobmanager. The script runs globusrun -o -s -r hostname/jobmanager '&(jobType=mpi)(executable=$(GLOBUSRUN_GASS_URL) # "mpich-cpi")(count=2)' to perform this test. The script will build the mpich-cpi program from your local mpich installation if possible.
Simple Job PBS Jobmanager
This test verifies that you can run a simple job on the remote machine via the PBS jobmanager (i.e., submitting to the PBS batch system on the remote machine). The script runs globusrun -o -r hostname/jobmanager-pbs '&(executable="/bin/echo")(arguments="GBOX TEST")' to perform this test.
MPICH Job PBS Jobmanager
This test verifies that you can run a simple mpich job using 2 CPUs on the remote machine via the PBS jobmanager. The script runs globusrun -o -s -r hostname/jobmanager-pbs '&(jobType=mpi)(executable=$(GLOBUSRUN_GASS_URL) # "mpich_test_program")(count=2)' to perform this test.
Glide-in Setup
This test sets up Condor Glide-in on the remote machine and compiles a test program, linked with the Condor libraries, to be used in later Condor Glide-in tests. The script runs the following commands: This test may fail for a number of reasons:
Glide-in Fork Jobmanager
This test submits a Condor Glide-in to the remote machine using the fork jobmanager and submits a test Condor job to run under the Glide-in, using the following commands: This test may fail for a number of reasons:
Glide-in PBS Jobmanager
Similar to the Glide-in Fork Jobmanager test, this test submits a Condor Glide-in to the remote machine using the PBS jobmanager and submits a test Condor job to run under the Glide-in, using the following commands: This test may fail for the same reasons listed for the Glide-in Fork Jobmanager test. Additionally, the test may fail because PBS ran the Glide-in job on a machine on a private network that can not connect directly back to your machine.
GSIFTP from Fork Job
This test verifies that a job submitted to the fork jobmanager on the remote machine can transfer files to and from a remote GSI-enabled FTP server. The server is specified by the -m argument and defaults to cornhead.ncsa.uiuc.edu. The script first uploads a file to the GSIFTP server with the command gsincftpput -E $gsiftp_server "~" /tmp/gbox-test.username.hostname. The script then submits a test job with the command globusrun -o -s -r hostname/jobmanager '&(executable=$(GLOBUSRUN_GASS_URL) # "gbox-gsiftpjob-test.sh". The test job retrieves the file from the GSIFTP server with the command $GLOBUS_LOCATION/bin/gsincftpget -E gsiftp_server . "~/gbox-test.username.hostname". This test may fail for the following reasons:
GSIFTP from PBS Job
Similar to the GSIFTP from Fork Job test, this test verifies that a job submitted to the PBS jobmanager on the remote machine can transfer files to and from a remote GSI-enabled FTP server. A common reason for this test failing is the directory containing the trusted Certificate Authority certificates not being installed on the PBS nodes.
The script also attempts to determine the following information about each site.
GiB Version
The script runs /usr/local/globus/bin/gib-version on the remote machine to get the GiB Version.
PBS Nodes
The script runs /usr/local/pbs/bin/pbsnodes -a on the remote machine to obtain the number of PBS nodes.
PBS CPUs
PBS Jobs Running
PBS Jobs Queued
The script runs /usr/local/pbs/bin/qstat -Q on the remote machine to obtain the number of CPUs controlled by PBS and the number of running and queued jobs.