NCSA Scheduler tool released on GitHub

07.01.15 -

NCSA has released Scheduler, software to help users submit a large number of independent jobs to the queue for Blue Waters and other high-performance computing systems, on GitHub at https://github.com/ncsa/Scheduler. Scheduler was developed by Victor Anisimov, a senior research programmer for the Blue Waters project.

On HPC systems, job submission is typically managed by batch-queuing systems like Torque or PBS. This system is tailored toward submitting parallel MPI jobs that may use from a few to thousands of compute nodes per application. The job is submitted to the queue with help of a qsub command. Each submitted job obtains a jobid.

If a researcher needs to handle thousands of single-node jobs rather than a single job that can use a thousand nodes, the batch queuing system becomes cumbersome. For instance, in order to submit 1,000 single node-jobs, a user has to prepare a separate batch script for each job and type the qsub command 1,000 times. This would queue 1,000 independent jobs. Furthermore, many HPC sites limit the number of jobs a user may submit into the queue; typically the cap is around 50.

The other limitation of standard batch queue systems is that they do not allow node sharing between applications. For instance, if a user needs to submit 3,200 single-core jobs, the standard approach is to submit one job per node, in spite of the fact that this may leave many cores on the node idle—on the 32-core Blue Waters nodes, this method would leave 31 cores idle.

Scheduler allows a user to aggregate single-core jobs as a single batch job. That means the user can submit a single job that bundles those 3,200 jobs with the help of a very simple configuration file. Scheduler also allows users to share the node between applications. For instance, a user needs only 100 nodes on Blue Waters in order to run 3,200 single-core jobs.

Scheduler allows queuing jobs. That means if a user has 3,200 short-running single-core jobs, the user is not required to use 100 nodes on Blue Waters. One or 10 nodes may be sufficient. Whenever a core completes the job it will pick up the next job from the configuration file. In addition to efficiently managing single-core jobs, Scheduler can also bundle OpenMP single-node jobs. (Note: Scheduler cannot bundle MPI jobs.)

Scheduler consists of less than 200 lines of source code. Users are free to integrate this code into their workflow in order to further automate the job submission process.

Scheduler is available at https://github.com/ncsa/Scheduler.

National Science Foundation

Blue Waters is supported by the National Science Foundation through awards ACI-0725070 and ACI-1238993.