NCSA Home
Contact Us Intranet

Running Jobs on NCSA's SGI Altix UV

User Information Home
Data
Security
Allocations
Consulting
Training

NCSA's Help Desk is available 24 hours a day, seven days a week, 365 days a year:
help.ncsa.illinois.edu
217-244-0710
help@ncsa.illinois.edu

  1. Overview
  2. Interactive Use
  3. Queues
  4. Scheduling Policies
  5. Batch Commands
    1. qsub
      1. qsub -I
    2. qstat
    3. qhist
    4. qs
    5. qdel
    6. qps
    7. qhosts
  6. Sample Batch Scripts
  7. Disk Space for Batch Jobs
  8. Automated Saving of Files from Batch Jobs

1. Overview

The NCSA SGI Altix UV uses the Altair Portable Batch System (PBS) Pro with the Moab Workload Manager for running jobs. To keep all jobs running within each system's memory [for best performance], and to achieve improved system uptime, memory specification for batch jobs is required and enforced.

2. Interactive Use

The access node ember.ncsa.illinois.edu is available for interactive use. User limits (for all active login sessions) are as follows:

  • a maximum of 4 processes per job
  • 4 Gbyte memory per process
  • CPU time of 30 mins per process
Jobs exceeding the above policy will be terminated. In general, interactive use should be limited to compiling and other development tasks, such as editing source and debugging; and limited staging of files. The batch system is available for all other jobs. See the section on qsub -I for instructions on how to run an interactive job on the compute nodes.

3. Queues

The following queues are currently available for users:

QueueWall Clock LimitMax #CoresMax Memory
debug30 minutes24 127 GB
normal100 hours1921000 GB
long
(3/16/11)
200 hours1921000 GB
dedicated (1) (2)50 hours3782000 GB

(1) Queue not scheduled automatically. Please send email to consult@ncsa.illinois.edu to run jobs in this queue.

(2) Jobs in the dedicated queue will be charged for all cores on the host regardless of how many processors the job uses.

4. Scheduling Policies

The scheduling policy on Ember is set to favor large memory or large core-count jobs. Jobs on Ember are not allowed to span more than one SMP compute host and timeshared jobs are not allowed to be larger than 192 cores (or its memory equivalent). Jobs larger than that are serviced by the dedicated queue and jobs are run from that queue by special request.

For timeshared jobs, the scheduler allocates cpusets (cores and associated memory) to jobs so different jobs don't share resources. The smallest cpuset on Ember is 6 cores and around 30GB of usable memory. Jobs are allocated the resources required to cover their request of cores and memory in multiples of 6.

As with other HPC systems at NCSA, the scheduling policy includes fair-share. This is a policy whereby a job's priority may be increased or decreased because of other jobs that the user's project may be running or have recently run. Basically, in order to give everyone a fair opportunity to run jobs, a user's job will have a higher priority if users in their project haven't run jobs in the recent past. Fair-share also factors in the ratio of the service units the user's project is allocated and the time to the allocation expiration.

To maximize utilization, the scheduler will also back-fill jobs. When trying to schedule large blocks of cores for large jobs, there are often "holes" where some cores are idle waiting to be added to a pool to start a large waiting job. The scheduler will attempt to increase system utilization and job throughput by opportunistically scheduling smaller and shorter jobs into these "holes" which would have otherwise left resources idle.

It is in the user's best interest that job wall time requests be made as accurately as possible. If everyone defaults to the maximum time allowed in the queue, there won't be any opportunistic back-fill. Even if a significant fraction of the submitted jobs default to the maximum wall time allowed, then the scheduler back fill will tend to fragment available resources as the prediction of "holes" will be inaccurate - resulting in having most jobs wind up waiting for a longer time to start.

When figuring out a job's priority relative to other jobs, there are several factors which are taken into account. Some of these factors include:

  • job size (how many cores)
  • job expansion factor (the ratio of the time the job has spent eligible to be run versus how much time the job has requested)
  • the raw amount of time the job has spent eligible to be run
  • fair-share factors
A relative weighting of these factors contributes to a job's priority.

A debug queue is available to facilitate fast turnaround on debugging/testing jobs. Jobs in this queue have an intrinsically higher priority; additionally, they accrue priority at a much higher rate because the expansion factor (and its associated priority factor) increases very quickly.

In order to keep jobs from the long queue from dominating the system and causing shorter jobs to wait behind them, there is a limit on the number of cores currently running jobs from the long queue. Given the fluid nature of our job load, this limit is adjusted from time to time, but in the general case we tend to keep it between 1/4 and 1/3 of the available cores. When that limit is reached, subsequent jobs in the queue may go into a blocked state until running jobs finish and free up resources. Then the jobs will automatically be moved from the blocked state and get scheduled to run.

5. Batch Commands

Below are brief descriptions of the useful batch commands. For more detailed information, refer to the individual man pages or the PBS Users' Guide.

5.1. qsub

The qsub command is used to submit a batch job to a queue. All options to qsub can be specified either on the command line or as a line in a script (known as an embedded option). Command line options have precedence over embedded options. Scripts can be submitted using

qsub [list of qsub options] script_name

The main qsub options are listed below. The sample batch scripts illustrate qsub usage and options. Also see the qsub man page for other options.

  • -l resource-list: specifies resource limits. The resource_list argument is of the form:
    resource_name[=[value]][:resource_name[=[value]]:...]:resource
    

    The resource_names required are:

    walltime: maximum wall clock time (hh:mm:ss) [default: 10 mins]
    ncpus: the number of processors to use.
    mem: the total memory required for the job (all processors).

    Example:
    #PBS -l walltime=00:30:00 -l ncpus=6 -l mem=20gb
    

    It is important to provide an accurate estimate of the memory requirement because of the way the batch system allocates memory and processors.

    Notes:

    1. For timeshared jobs, the scheduler allocates cpusets (cores and associated memory) to jobs so different jobs don't share resources. The Altix UV systems have 6 cores per socket, with two sockets per node board. The minimum number of cores allocated to a job is 6. Charging will be based on the resources required to accommodate the core and memory specifications of the job.
    2. The memory specification for your job will be enforced so your job must run within the requested memory. Jobs will be terminated if they exceed their memory request.

  • -q queue_name: specify queue name.[default: normal]

  • -N jobname: specifies the job name.

  • -o out_file: store the standard output of the job to file out_file. [default :<jobname>.o<PBS_JOBID>]

  • -j oe: merge standard output and standard error into standard output file.

  • -k oe: place standard output and standard error files in your $HOME directory. The filenames will be of the form <jobname>.o<PBS_JOBID> and <jobname>.e<PBS_JOBID> respectively. If this option is used in conjunction with -j oe, standard output and standard error are combined into standard output file. The -k option overrides the -o option.

  • -V: export all your environment variables to the batch job.

  • -m be: send mail at the begining and end of a job.

  • -A psn: charge your job to a specific project (PSN). (for users on more than one PSN)

5.1.1 qsub -I

The -I option tells qsub you want to run an interactive job. You can also use other qsub options such as those documented in the batch sample scripts (/usr/local/doc/pbs/samples/). For example, the following command:

   qsub -I -V -l walltime=00:30:00,ncpus=6,mem=20gb -q debug

will run an interactive job with a wall clock limit of 30 minutes, using 6 cores and 20 gigabytes of memory.

After you enter the command, you will have to wait for PBS to start the job. As with any job, your interactive job will wait in the queue until the resources are available. For jobs less than 30 minutes of wall time, specify the debug queue for higher priority. Once the job starts, you will see something like this:

	This job will be charged to account: ags (TG-STA060003)
	qsub: waiting for job 38989.ember to start
	qsub: job 38989.ember ready

	----------------------------------------
	!Begin PBS Prologue Wed Mar  2 10:24:32 2011
	Job ID:    38989
	Username:  sjohn
	Group:     aaa

	Creating Batch Directory 38989 in /scratch/batch/
 
	----------------------------------------
	set_SCR: using existing PBS job directory /scratch/batch/38989

	[sjohn@ember-cmp2 ~]$

When you are done with your interactive commands, you can use the exit command to end the job:

	[sjohn@ember-cmp2 ~]$ exit
	logout
 
	qsub: job 38989.ember completed

You will be charged for the wall time used until you end the job.

Note: For running applications that require an X display on a compute host in an interactive batch job, please see Using VNC on Ember.

5.2. qstat

The qstat command displays the status of PBS batch jobs.
  • qstat -a gives the status of all jobs on the system.
  • qstat -n lists nodes allocated to a running job in addition to basic information.
  • qstat -f PBS_JOBID gives detailed information on a particular job.
  • qstat -q provides summary information on all the queues.

See the man page for other options available.

5.3. qhist

The qhist command summarizes the raw accounting record(s) for one or more jobs. See the output of "qhist --help" for details.
NOTE: SU charges for a job are available the day after the job completes.

To display information about a specific job, the syntax is qhist PBS_JOBID.

$ $ qhist 238

Scanning PBS raw accounting records: 07/14/2010 - 09/27/2010

Compute Host:       ember-cmp3
JobId:              238
JobName:            testjob
User:               arnoldg
TG acct (project):  -local proj- (aaa)
Queue:              normal

Job limits:
  wall clock:       10:00:00    
  Requested CPUs:   120      
  Requested Memory: 500mb    

Queued:             09/24/10 13:58
Started:            09/24/10 13:59
Ended:              09/25/10 00:01

Usage:
  wall clock:       10:02:03    
     cputime:       00:01:26    
         SUs:       60.20       
      memory:        19.32M   

qhist can also produce tables of information from the PBS raw accounting records. For example, to create a table for your jobs that started between September 24, 2010 and September 25, 2010, run the following command
$ qhist -S 9/24/10,9/25/10 

Scanning PBS raw accounting records: 09/22/2010 - 09/27/2010


  JobId  JobName       NCPU  Stat  StartDate       EndDate              SUs  
---------------------------------------------------------------------------
    220  D                1     E  09/24/10 12:28  09/24/10 12:29      0.12
    221  D                1     E  09/24/10 12:30  09/24/10 13:19      4.88
    222  D               12     E  09/24/10 13:20  09/24/10 13:39      1.83
    225  testjob         64     E  09/24/10 13:41  09/24/10 13:41      0.01
    226  testjob         64     E  09/24/10 13:43  09/24/10 13:43      0.01
    227  testjob         64     E  09/24/10 13:45  09/24/10 13:45      0.01
    229  asIF!           32     E  09/24/10 13:47  09/24/10 13:50      0.28
    230  asIF!           32     E  09/24/10 13:51  09/24/10 13:51      0.02
    238  testjob        120     E  09/24/10 13:59  09/25/10 00:01     60.20
    239  testjob        120     E  09/24/10 14:01  09/25/10 00:03     60.20
    240  testjob        120     E  09/24/10 14:01  09/25/10 00:03     60.20
    241  testjob         64     E  09/25/10 00:11  09/25/10 00:11      0.01
---------------------------------------------------------------------------
Total # jobs = 12
Total # SUs  = 187.77

Notes:

  • Depending upon the search criteria, qhist may search through records for a couple days on both ends of the date range you specify in order to collect more information about the job(s).
  • the Stat column displays the last known status of the job:
    • Q - queued
    • H - queued and in hold state
    • D - deleted (delete record found, but end record not found)
    • S - started running (start record found)
    • E - ended, can be either normally or via deletion (end record found)
    • A - aborted by Torque server, for example due to user being over quota or failed job dependencies (-W depend).

5.4 qs

The qs command displays a detailed table of the status of PBS batch jobs, and can be used as an alternative to the qstat command. Information like job dependancy, queue wait time, elapse time and execution host for running jobs can be viewed at a glance.

     qs 
will display all queued and running jobs.

See qs -h for options.

5.5 qdel

The qdel command deletes a queued job or kills a running job. The syntax is qdel PBS_JOBID.

5.6 qps

The qps command prints ps style information for processes* running on the Ember compute hosts.

[* only the first openmp or thread is shown for threaded processes]

     qps
will display the process information for all your processes on the Ember compute hosts.
     qps -j JOBID
will display the process information for a particular job.

See qps -h for other options.

5.7 qhosts

The qhosts command summarizes PBS information for hosts and provides counts of claimed CPU and MEM for each host on Ember. The load average, memory used, and uptime data for each host are also provided.

6. Sample Batch Scripts

Sample batch scripts are available in the directory /usr/local/doc/batch_scripts for use as a template.

7. Disk Space for Batch Jobs

Scratch space for batch jobs is provided via a per-job scratch directory that is created at the beginning of the job. This directory is created under /scratch/batch, and is based on the JobID. If the batch script uses one of the sample scripts as a template, the name of this scratch directory is available to job scripts with the $SCR environment variable.

Your job scratch directory may be deleted soon [possibly immediately] after your job completes, so you should take care to transfer results to the mass storage system (see the section Automated Saving of Files from Batch Jobs).

The cdjob command can be used to change the working directory to the scratch directory of a running batch job. The syntax is

cdjob PBS_JOBID

8. Automated Saving of Files from Batch Jobs

The saveafterjob utility is available for automated, guaranteed saving of output files from batch jobs to the mass storage system. For details on its use, see the saveafterjob page and the sample PBS batch scripts.

back to Top