Grid-in-a-Box Test Script
The Grid-in-a-Box test script is a simple perl script that tests the
setup of Grid-enabled Linux clusters for the
Grid in a Box
project. The script is available here.
Usage
usage: gib-setup-test [-h htmlfile] [-k] [-t secs] [-m hostname] [-g hostname] [-b mdsbasedn] site1 [site2 ...]
where -h writes html results to htmlfile
-t specifies a timeout for child processes
-m specifies the hostname of a gsiftp server
-g specifies the hostname of the GIIS
-b specifies the GIIS MDS branch-point
site1 [site2 ...] is a list of sites to test
Prerequisites
You must have a valid proxy credential to run the gib-setup-test
script. Run grid-proxy-init to obtain a proxy credential.
Test Definitions
The script performs the following tests.
- GIIS
- This test verifies that the remote machine is advertising its
jobmanagers to the Grid Index Information Service for your virtual
organization (VO). By default it queries the GiB GIIS
(mds-gib.ncsa.uiuc.edu) for each machine. An alternative GIIS and
VO can be specified with the -g and -b
options. The script runs grid-info-search -h
mds-gib.ncsa.uiuc.edu -x -b 'mds-vo-name=mds-gib,o=grid' to
perform this test.
- Gatekeeper
- This test verifies that you can successfully authenticate to the
gatekeeper on the remote machine. Authenticating to the gatekeeper
is a prerequisite to running jobs via the jobmanager(s). If you
fail to authenticate to the gatekeeper, there is a good chance you
won't be able to authenticate to any Grid services on this machine.
This test may fail for a number of reasons:
- You are not in the grid-mapfile on the remote machine (i.e., you
don't have an account there).
- The remote machine does not have a valid certificate, or the
certificate for the remote machine is not signed by a Certificate
Authority that you trust.
The script runs globusrun -a -r hostname to perform this
test.
- GRIS
- This test performs a test query against the remote machine's
Grid Resource Information Service. The script runs
grid-info-search -h $site -x to
perform this test.
- GSIFTP
- This test verifies that the GSI-enabled FTP service is available
on the remote machine. The script tests the service with the
following commands:
- gsincftpput -E hostname /tmp /tmp/gbox-test.username.hostname
- gsincftpget -E hostname /tmp /tmp/gbox-test.username.hostname
- GSISSH
- This test verifies that you can login and run commands via the
GSISSH service on the remote machine. This test requires a GSISSH
client named gsissh to be in your shell's search path. The script
runs gsissh -o "BatchMode yes" $site /bin/echo "GBOX
TEST" to perform this test.
- Condor-G
- This test verifies that Condor-G is installed and running on the
remote machine by running condor_q
on the remote machine and examining the output.
- Simple Job Fork Jobmanager
- This test verifies that you can run a simple job on the remote
machine via the fork jobmanager. The script runs globusrun -o
-r hostname/jobmanager '&(executable="/bin/echo")(arguments="GBOX
TEST")' to perform this test.
- MPICH Job Fork Jobmanager
- This test verifies that you can run a simple mpich job using 2
CPUs on the remote machine via the fork jobmanager. The script runs
globusrun -o -s -r hostname/jobmanager
'&(jobType=mpi)(executable=$(GLOBUSRUN_GASS_URL) #
"mpich-cpi")(count=2)' to perform this test.
The script will build the mpich-cpi program from your local mpich
installation if possible.
- Simple Job PBS Jobmanager
- This test verifies that you can run a simple job on the remote
machine via the PBS jobmanager (i.e., submitting to the PBS batch
system on the remote machine). The script runs globusrun -o
-r hostname/jobmanager-pbs '&(executable="/bin/echo")(arguments="GBOX
TEST")' to perform this test.
- MPICH Job PBS Jobmanager
- This test verifies that you can run a simple mpich job using 2
CPUs on the remote machine via the PBS jobmanager. The script runs
globusrun -o -s -r hostname/jobmanager-pbs
'&(jobType=mpi)(executable=$(GLOBUSRUN_GASS_URL) #
"mpich_test_program")(count=2)' to perform this test.
- Glide-in Setup
- This test sets up Condor Glide-in on the remote machine and
compiles a test program, linked with the Condor libraries, to be
used in later Condor Glide-in tests. The script runs the following
commands:
- condor_compile cc -o gib-setup-test.condor gib-setup-test.condor.c
- condor_glidein --setuponly hostname
This test may fail for a number of reasons:
- You do not have Condor-G installed on the machine where you are
running gib-setup-test.
- You do not have permission to access the Condor Glide-in FTP
site. You can request access by sending email to
condor-admin@cs.wisc.edu.
Include your certificate subject (the output of grid-cert-info
-subject) in the request.
- Glide-in Fork Jobmanager
- This test submits a Condor Glide-in to the remote machine using
the fork jobmanager and submits a test Condor job to run under the
Glide-in, using the following commands:
- condor_glidein --runonly --idletime 10 hostname/jobmanager
- condor_submit /tmp/condor_submit.pid
This test may fail for a number of reasons:
- There may be a problem with your local Condor-G installation.
Check the logs in the runtime/log directory of your Condor-G
installation for errors.
- Your Condor-G installation may not be configured to support
Glide-in. COLLECTOR and NEGOTIATOR should be included in the
DAEMON_LIST parameter in etc/condor_config. (Search for "GlideIn"
in etc/condor_config.)
- The remote machine may not have permission to join your "Condor
Pool". The hostname of the remote machine should match one of the
expressions in the GLIDEIN_SITES list in etc/condor_config
- Network ports used by Glide-in between the remote machine and
your machine may be blocked. Glide-in uses ports in the
dynamic/private range (typically 32768-65536).
- Condor Glide-in bound itself to the remote machine's loopback
interface (127.0.0.1) due to a misconfiguration of the /etc/hosts
file. Check ~/Condor_glidein/local/log.* on the remote machine for
log files.
- Glide-in PBS Jobmanager
- Similar to the Glide-in Fork
Jobmanager test, this test submits a Condor Glide-in to the
remote machine using the PBS jobmanager and submits a test Condor
job to run under the Glide-in, using the following commands:
- condor_glidein --runonly --idletime 10 hostname/jobmanager-pbs
- condor_submit /tmp/condor_submit.pid
This test may fail for the same reasons listed for the
Glide-in Fork Jobmanager test.
Additionally, the test may fail because PBS ran the Glide-in job on
a machine on a private network that can not connect directly back to
your machine.
- GSIFTP from Fork Job
- This test verifies that a job submitted to the fork jobmanager
on the remote machine can transfer files to and from a remote
GSI-enabled FTP server. The server is specified by the
-m argument and defaults to cornhead.ncsa.uiuc.edu. The
script first uploads a file to the GSIFTP server with the command
gsincftpput -E $gsiftp_server "~"
/tmp/gbox-test.username.hostname. The script then submits a
test job with the command globusrun -o -s -r
hostname/jobmanager '&(executable=$(GLOBUSRUN_GASS_URL) #
"gbox-gsiftpjob-test.sh". The test job retrieves the file
from the GSIFTP server with the command
$GLOBUS_LOCATION/bin/gsincftpget -E gsiftp_server
. "~/gbox-test.username.hostname". This test may fail for the
following reasons:
- gsincftpget is not installed on the remote machine.
- DNS resolution is not working on the remote machine, usually
resulting in "host unknown" errors.
- The remote machine does not trust the Certificate Authority that
signed the GSIFTP server's certificate.
- GSIFTP from PBS Job
- Similar to the GSIFTP from Fork Job
test, this test verifies that a job submitted to the PBS jobmanager
on the remote machine can transfer files to and from a remote
GSI-enabled FTP server. A common reason for this test failing is
the directory containing the trusted Certificate Authority
certificates not being installed on the PBS nodes.
The script also attempts to determine the following information about
each site.
- GiB Version
- The script runs /usr/local/globus/bin/gib-version on
the remote machine to get the GiB Version.
- PBS Nodes
- The script runs /usr/local/pbs/bin/pbsnodes -a on the
remote machine to obtain the number of PBS nodes.
- PBS CPUs
- PBS Jobs Running
- PBS Jobs Queued
- The script runs /usr/local/pbs/bin/qstat -Q on the
remote machine to obtain the number of CPUs controlled by PBS and
the number of running and queued jobs.