Previous:
gsn
View by
Date;
View by
Name
lsf
UPDATE 12/19/02
Dedicated queues have had their wall clock time limits increased to be in line
with the standard queues:
short: 50 hours
medium: 200 hours
long: 400 hours
-------------------------------------------------------------------------
UPDATE 7/26/02
balder (256-processor Origin2000) has been split into two 128-processor systems.
As a result, the 256-processor queues are no longer available.
-------------------------------------------------------------------------
UPDATE 8/30/01
All dedicated queues (short, medium, and long) will now be active at all times.
(Previously, the long queues were active only during the weekend.)
-------------------------------------------------------------------------
LSF lsbatch on the SGI Cray Origin2000
(Load Share Facility)
lsbatch (LSF version 3.0) is available on the SGI Cray Origin2000. The best
starting place for LSF information is the lsbatch man page. It contains a
description of all the batch related commands. (Please add
/usr/local/lsf/man to your MANPATH variable if it's not there already to
get the lsf man pages.)
Batch System Procedures
-----------------------
(a) Timeshared Queues
The following parameters
* the number of threads
* the peak memory
* the total job run time
needed by the job are required at submission time via bsub options. Jobs are
routed to the appropriate queue based on these parameters. The default
queues are:
Debug Queue:
Normalized Normalized
Queue Job Job Job
Name Run Time CPU Time Size
---------------------------------------------------
debug 10 mins 5 mins 1-4 threads
per thread < 2 Gb memory
---------------------------------------------------
Regular Queues:
Queue Names
Normalized Job Size
Job
Run time Small Medium Large
1-8 threads 9-16 threads 17-64 threads
< 2 Gb memory < 4 Gb memory < 25 Gb memory
------------------------------------------------------------------
5 hours vst_sj vst_mj vst_lj
ind_vst_sj ind_vst_mj ind_vst_lj
50 hours st_sj st_mj st_lj
ind_st_sj ind_st_mj ind_st_lj
200 hours mt_sj mt_mj mt_lj
ind_mt_sj ind_mt_mj ind_mt_lj
400 hours lt_sj lt_mj lt_lj
ind_lt_sj ind_lt_mj ind_lt_lj
------------------------------------------------------------------
The bsub options are:
-n specifies the number of threads (default = 1). This is the maximum
number of active processes/threads at any given time during the
lifetime of the job.
If different numbers of processors are used over the lifetime of
the job, you must specify the maximum number used.
-M specify job peak memory limit (default = 512 Mb). This is the
sum of the memory usage for all processes/threads in the job. The
memory usage is reported by the ps(1) command as RSS: Total resident
size of the process. This includes only those pages of the process
that are physically resident in memory.
If no unit is specified, Kilobytes is assumed. Specify K, M, or G
for Kilobytes, Megabytes, and Gigabytes respectively.
NOTE: The unit specification for -M option only works at NCSA. It is
not standard LSF.
-W specify total job run time (default = 60 mins). The syntax is
[hour:]minute. Run time is defined as the wall clock time for the job,
excluding the time used for mass storage transfers and time that the job
may be suspended by the system.
NOTE: Because the NCSA Origin array is comprised of both 195MHz and
250MHz processors, run time for a job can vary depending on the
host on which it runs. The run time on the 250Mhz hosts are normalized
by a factor of 250/195. Run time limits for a job are based on the
normalized run time.
Charging for jobs is based on normalized cputime. The busage command
gives both the actual and normalized cputime and run time.
You can use the busage command to help determine accurate limits for a job.
We recommend that you set the limits to about 110% of the usage reported by
busage. After a job has finished, enter: busage [jobId]
The number of processes/threads used by the process (bsub with the -n
option) is reported as:
number of processes/threads: XXX
The peak memory usage (bsub with the -M option) is reported as:
peak memory: XXXX
The run time (bsub with the -W option) is reported as:
runtime: XXXX
runtime (normalized): XXXX
Notes
-----
i. For jobs that do not specify any or all of these parameters, bsub will
supply the defaults (-n1, -M512M, -W60).
ii. We strongly recommend not specifying a queue name when submitting jobs
to the above listed default queues. For jobs that do specify a queue
name, the values that you include for -n, -M, and -W (or get by
default) are used to accept or reject a job. That is, if the values
of the parameters fit the limits of the queue, the job is accepted;
if not, the job is rejected.
iii. The queues prefaced with ind_ are industrial queues restricted to NCSA
industrial partners and run at a higher priority. Jobs belonging to
industrial users are automatically routed to these queues. Industrial
users who wish to submit jobs to the non-industrial default queues
need to specify the queue name in addition to the parameters.
iv. The limits on the queues are for lsbatch queue selection purposes only;
individual job limits are based on the parameters (-n, -M, and -W)
specified.
v. To get the peak memory used by a job, use the busage command and take
the value from the peak memory entry under "Information collected
by sampling the running job" listing.
vi. The environment variable $BSUB_NUMTHREADS is set to the
number specified in the BSUB -n option, so can be used for setting the
number of threads for your program. For example:
setenv MP_SET_NUMTHREADS $BSUB_NUMTHREADS
mpirun -np $BSUB_NUMTHREADS a.out
(b) Dedicated Queues
The dedicated queues are meant for benchmarking. There is a premium on
charges for jobs run in the dedicated queues. See /usr/news/Charging_algorithm
for information. Currently, the following queues are available:
Queue Name No. of Memory Job Service
Processors (Gb) Time Limit Level
----------------------------------------------------------------------------
128_ded_short 128 76 50 hour wall clock time Normal dedicated
ind_128_ded_st 128 76 50 hour wall clock time Priority dedicated
128_ded_med 128 76 200 hours wall clock time Normal dedicated
ind_128_ded_mt 128 76 200 hours wall clock time Priority dedicated
128_ded_long 128 76 400 hours wall clock time Normal dedicated
ind_128_ded_lt 128 76 400 hours wall clock time Priority dedicated
----------------------------------------------------------------------------
The ind (industrial) queues have priority over the regular queues within
each time class.
Also, if there are no jobs queued in the dedicated queues, the machines will
run jobs in the vst queues. Therefore, a subsequent dedicated job that is
queued will need to wait until currently running vst jobs are done. This can
take upto 5 hours (the run time limit in the vst queues).
To submit jobs to the dedicated queues, specify the -q option to bsub. You
do not need to specify the parameters -n, -M, and -W for jobs in these queues;
however, you are encouraged to use -W.
Tips on running in the dedicated queues is available at
http://www.ncsa.uiuc.edu/UserInfo/Consulting/Tips/Dedicated.html.
Users whose codes do not scale to the number of processors available in the
dedicated queues can still use the dedicated queues to run multiple jobs for
faster turnaround. Information on doing this is available at:
http://www.ncsa.uiuc.edu/UserInfo/Consulting/Tips/dplace.html
Job Submission and Control
--------------------------
The easiest way to run a job in the batch system is via a shell script.
A sample script is available in the directory /usr/local/doc/lsf/samples
that you can modify for your own use.
You can specify options to lsbatch via the lsbatch "bsub" option.
Once you have created a job script, submit the job to lsbatch using the
bsub command as follows (the script can have embedded bsub options):
bsub < script_name
In this case, the script file is spooled by lsbatch.
NOTE:
bsub {list of bsub options} script_name/executable
also works; in this case all bsub options need to be on the command line
(embedded bsub options are ignored) and the script file is NOT spooled by
lsbatch.
Other useful bsub options:
-J job name
-B send mail when job starts
-N send mail when job ends
-o specify standard output file
-P specify project to be charged [**]
[**] For users with multiple projects; use the 'usage' command to see your
projects.
Useful commands (check the man page for usage and syntax):
To check on the status of a job: bjobs
To kill a job in the batch system: bkill [request_id]
To get resource statistics on
both running and completed jobs: busage
To get information on all
active processes in a job: bps
Access to Batch
---------------
A batch job may be submitted any time. Batch jobs run at all times.
There is currently a limit of 5 total (queued and running) jobs.
Disk Space for Batch Jobs
-------------------------
Each machine in the Origin array has a local XFS scratch filesystem.
We *strongly* recommend using the machine-local scratch directory for
running jobs over NFS mounted non-local scratch (for e.g., scratch-modi4)
for performance and reliability reasons. Use of non-local scratch is at
your own risk. No refunds will be issued for batch jobs that failed due to
use of non-local scratch.
Each batch job has a per-job scratch directory, which is created on the
local scratch directory on the executing host when the job starts. The
directory is named based on the batch jobID and the start time of the job.
The name of this directory is available to batch job scripts in the $SCR
enviroment variable.
For example, job 229106 that started on Feb 5, 1999 at 11:43:22 on machine
jord1 would have $SCR set to:
/scratch-jord1/LSBATCH/229106.5Feb1999114322
To go to the scratch directory associated with this job, at the shell prompt,
enter:
% cd /scratch-jord1/LSBATCH/229106*
See the sample scripts in /usr/local/doc/lsf/samples on how to use
machine-local scratch.
Documentation
-------------
http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/Origin2000/Doc/Jobs.html
has information on running jobs.
The directory /usr/local/doc/lsf contains postscript versions of the
LSF User's Guide, Administrator's Guide and Release Notes.
Silicon Graphics Origin2000:usr/news/lsf
Last Modified: February 10, 2003