NCSA Home
Contact Us | Intranet | Search

ncsa

Running Multiple Jobs in the Dedicated Queues on the NCSA Origin2000

  1. Introduction
  2. Placement Files
  3. Batch script

1. Introduction

Users whose codes do not scale to the number of processors available in the dedicated queues (128 or 256) can still use the dedicated queues to run multiple jobs. Optimal performance is accomplished using the tool dplace. dplace executes a specified program, initializing the processes and memory of that program on the nodes that you specify. This eliminates the potential for poor performance resulting from multiple threads executing on the same processor. As with a single dedicated job, for best performance, it is recommended that you do not use all the processors on the system. Leave 1-2 processors free for use by the operating system and other system processes.

Note that for benchmarking runs, SGI recommends running jobs one at a time rather than using dplace to run multiple jobs at a time.

The following two man pages give information that you will need to know about dplace:

  • man 1 dplace documents the command-line arguments
  • man 5 dplace documents the syntax of the placement file you use to specify program placement.
The section Non-MP Library Programs and Dplace in Chapter 8 of the SGI manual Origin2000 and Onyx2 Performance Tuning and Optimization Guide also has detailed information on dplace.

To place data according to the file placement_file for the executable a.out that would normally be run by:

a.out
you would use:
dplace -place placement_file a.out
For MPI jobs that would normally be run by:
mpirun -np n a.out
you would use:
setenv MPI_DSM_OFF
mpirun -np n dplace -place placement_file a.out

2. Placement Files

Each Origin node contains two CPUs, and each module contains four nodes. The 128 processor systems have modules numbered

   1  2  3  4  5  6  7  8  11  12  13  14  15  16  17  18
while the 256 processor systems (as of Aug 20, 1999) have modules numbered
   1  2  3  4  5  6  7  8  11  12  13  14  15  16  17  18
   21  22  23  24  25  26  27  28  31  32  33  34  35  36  37  38
Note: Prior to Aug 20, 1999, the modules on the 256 processor systems were numbered:
   1  2  3  4  5  6  7  8  13  14  15  16  17  18 19 20
   21  22  23  24  25  26  27  28  33  34  35  36  37  38 39 40

The dplace command can be used to specify physical nodes for each job that will run simultaneously.

For example, to run 4 31-processor jobs in the 128 dedicated queues, the jobs can be run as follows:

JOB     MODULE
 1      1  2  3  4
 2      5  6  7  8
 3     11  12  13  14
 4     15  16  17  18
The placement file for the first job would be:
  • Native shared memory parallel jobs:
    memories 16 in topology physical near \
        /hw/module/1/slot/n1/node \
        /hw/module/1/slot/n2/node \
        /hw/module/1/slot/n3/node \
        /hw/module/1/slot/n4/node \
        /hw/module/2/slot/n1/node \
        /hw/module/2/slot/n2/node \
        /hw/module/2/slot/n3/node \
        /hw/module/2/slot/n4/node \
        /hw/module/3/slot/n1/node \
        /hw/module/3/slot/n2/node \
        /hw/module/3/slot/n3/node \
        /hw/module/3/slot/n4/node \
        /hw/module/4/slot/n1/node \
        /hw/module/4/slot/n2/node \
        /hw/module/4/slot/n3/node \
        /hw/module/4/slot/n4/node
    threads 31
    distribute threads 0:30 across memories
    

  • MPI jobs:
    memories 16 in topology physical near \
        /hw/module/1/slot/n1/node \
        /hw/module/1/slot/n2/node \
        /hw/module/1/slot/n3/node \
        /hw/module/1/slot/n4/node \
        /hw/module/2/slot/n1/node \
        /hw/module/2/slot/n2/node \
        /hw/module/2/slot/n3/node \
        /hw/module/2/slot/n4/node \
        /hw/module/3/slot/n1/node \
        /hw/module/3/slot/n2/node \
        /hw/module/3/slot/n3/node \
        /hw/module/3/slot/n4/node \
        /hw/module/4/slot/n1/node \
        /hw/module/4/slot/n2/node \
        /hw/module/4/slot/n3/node \
        /hw/module/4/slot/n4/node
    threads 32
    distribute threads 0:30 across memories
    
Similar placement files would be used for the other 3 jobs by specifying the other modules. Note that with MPI jobs, there is an extra mpirun process, therefore, the number of threads specified is 32 (31 + 1).

Note 1:Use of the above dplace construct in the timeshared queues is not recommended because of possible poor performance from interactions with other user jobs running on the system.

Note 2: Use of the above dplace construct is not appropriate for codes that use POSIX threads.

3. Batch script

In a batch job, the commands would be:
setenv MP_SET_NUMTHREADS 31
dplace -place placement_file1 a.out &
dplace -place placement_file2 a.out &
dplace -place placement_file3 a.out &
dplace -place placement_file4 a.out &
wait
For MPI jobs:
setenv MPI_DSM_OFF
mpirun -np 31 dplace -place placement_file1 a.out &
mpirun -np 31 dplace -place placement_file2 a.out &
mpirun -np 31 dplace -place placement_file3 a.out &
mpirun -np 31 dplace -place placement_file4 a.out &
wait
Note that the wait command will wait for all the background subjobs to complete before going on to the next command in the batch script (for example, msscmd commands to save output files).

Ideally, all the subjobs will do approximately the same amount of computation, and will execute in approximately the same amount of time. This will miminize the wall-clock time (and thus charges for the job) waiting for one or more of the subjobs to complete.