IBM Books

IBM LoadLeveler for AIX 5L: Using and Administering

Step 3: Specify class stanzas

The information in a class stanza defines characteristics for that class. Class stanzas are optional. Class stanzas take the following format. Default values for keywords appear in bold.

Figure 32. Format of a class stanza


label: type = class
admin= list
ckpt_dir = directory
ckpt_time_limit = hardlimit,softlimit
class_comment = "string"
default_resources = name(count) name(count)...name(count)
exclude_groups = list
exclude_users = list
execution_factor = number
include_groups = list
include_users = list
master_node_requirement = true | false
maxjobs = number
max_node = number
max_processors = number
max_total_tasks = number
nice = value
NQS_class = true | false
NQS_submit = name
NQS_query = queue names
priority = number
total_tasks = number
core_limit = hardlimit,softlimit
cpu_limit = hardlimit,softlimit
data_limit = hardlimit,softlimit
file_limit = hardlimit,softlimit
job_cpu_limit = hardlimit,softlimit
rss_limit = hardlimit,softlimit
stack_limit = hardlimit,softlimit
wall_clock_limit = hardlimit,softlimit

You can specify the following keywords in a class stanza:

admin = list
Where list is a blank-delimited list of administrators for this class. These administrators can hold, release, and cancel jobs in this class.

ckpt_dir = directory
Where directory is the directory location to be used for checkpoint files that did not have a directory name specified in the job command file. If the value specified does not have a fully qualified directory path (including the beginning forward slash), the initial working directory will be inserted before the specified value.

The value specified by the ckpt_dir keyword is only used when the ckpt_file keyword in the job command file does not contain a full path name and the ckpt_dir keyword in the job command file is not specified. For more information on determining the checkpoint directory, see Naming checkpoint files and directories.

class_comment = "string"
Where string is text characterizing the class. This information appears when the user is building a job command file using the GUI and requests Choice information on the classes to which he or she is authorized to submit jobs. The comment string associated with this keyword cannot contain an equal sign (=) or a colon (:) character. The length of the string cannot exceed 1024 characters.

default_resources = name(count) name(count)...name(count)

Specifies the default amount of resources consumed by a task of a job step, of this class, provided that no resources keyword is coded for the step in the job command file. If a resources keyword is coded for a job step, then it overrides any default resources associated with the associated job class.

The administrator defines the name and count values for default_resources. In addition, name(count) could also be ConsumableCpus(count), ConsumableMemory(count units), or ConsumableVirtualMemory(count units). ConsumableMemory and ConsumableVirtualMemory are the only two consumable resources that can be specified with both a count and units. The count for each specified resource must be an integer greater than or equal to zero, with three exceptions: ConsumableCpus, and ConsumableMemory must be specified with a value which is greater than zero, and ConsumableVirtualMemory must be specified with a value greater than 0, and greater than or equal to the image_size (units for image_size are in kilobytes). If the count is not valid, then LoadLeveler will issue an error message, and will not submit the job. The allowable units are those normally used with LoadLeveler data limits:

b bytes
w words
kb kilobytes (2**10 bytes)
kw kilowords (2**12 bytes)
mb megabytes (2**20 bytes)
mw megawords (2**22 bytes)
gb gigabytes (2**30 bytes)
gw gigawords (2**32 bytes)
tb terabytes (2**40 bytes)
tw terawords (2**42 bytes)
pb petabytes (2**50 bytes)
pw petawords (2**52 bytes)
eb exabytes  (2**60 bytes)
ew exawords  (2**62 bytes)

The ConsumableMemory and ConsumableVirtualMemory values are stored in MB (megabytes) and rounded up. Therefore, the smallest amount of ConsumableMemory or ConsumableVirtualMemory which you can request is one megabyte. If no units are specified, then megabytes are assumed. Resources defined here that are not in the SCHEDULE_BY_RESOURCES list in the global configuration file will not effect the scheduling of the job.

exclude_groups = list
Where list is a blank-delimited list of groups who are not allowed to submit jobs of that class name. Do not specify both a list of included groups and a list of excluded groups. Only one of these may be used for any class. The default is that no groups are excluded.

exclude_users = list
Where list is a blank-delimited list of users who are not permitted to submit jobs of that class name. Do not specify both a list of included users and a list of excluded users. Only one of these may be used for any class. The default is that no users are excluded.

execution_factor = number
Specifies how much processing time jobs of this class will receive relative to other jobs operating on the same node. For example, if job A has an execution_factor of 2 and job B has an execution_factor of 1, then LoadLeveler will allocate twice the number of rows in the Gang matrix (and therefore twice the amount of processing time) to job A as to job B. The range of values for this keyword are 1, 2, or 3 and the default is 1.
Note:This keyword is used by Gang scheduling only.

include_groups = list
Where list is a blank-delimited list of groups who are allowed to submit jobs of that class name. If provided, this list limits groups of that class to those on the list. Do not specify both a list of included groups and a list of excluded groups. Only one of these may be used for any class. The default is to include all groups.

include_users = list
Where list is a blank-delimited list of users who are permitted to submit jobs of that class name. If provided, this list limits users of that class to those on the list. Do not specify both a list of included users and a list of excluded users. Only one of these may be used for any class. The default is to include all users.

master_node_requirement =  true |false
Where true specifies that parallel jobs in this class require the master node feature. For these jobs, LoadLeveler allocates the first node (called the "master") on a machine having the master_node_exclusive = true setting in its machine stanza. If most or all of your parallel jobs require this feature, you should consider placing the statement master_node_requirement = true in your default class stanza. Then, for classes that do not require this feature, you can use the statement master_node_requirement = false in their class stanzas to override the default setting. One machine per class should have the true setting; if more than one machine has this setting, normal scheduling selection is performed.
Note:master_node_requirement is ignored by Gang scheduler.

maxjobs = number
Where number is the maximum number of jobs that can run in this class. If the class stanza does not specify maxjobs, or if there is no class stanza at all, the maximum jobs that can be simultaneously run in this class is defined in the default stanza. The default is -1, which means that no limit is placed on the number of jobs a user can submit.

max_node = number
Where number specifies the maximum number of nodes a user submitting jobs in this class can request for a parallel job in a job command file using the node keyword. The default is -1, which means there is no limit. The max_node keyword will not affect the use of the min_processors and max_processors keywords in the job command file.

max_processors = number
Where number specifies the maximum number of processors a user submitting jobs to this class can request for a parallel job in a job command file using the min_processors and max_processors keywords. The default is -1 which means that there is no limit.

max_total_tasks = number
Specifies the maximum total number of tasks that the scheduler will allocate at any given time to run the jobs of this class. The default value for this keyword is -1 which is unlimited.
Note:This keyword is used by Gang scheduling only.

nice = value
Where value is the amount by which the current UNIX nice value is incremented. The nice value is one factor in a job's run priority. The lower the number, the higher the run priority. If two jobs are running on a machine, the nice value determines the percentage of the CPU allocated to each job.

This value ranges from -20 to 20. Values out of this range are placed at the top (or bottom) of the range. For example, if your current nice value is 15, and you specify nice = 10, the resulting value is 20 (the upper limit) rather than 25. The default is 0.

If the administrator has decided to enforce consumable resources, the nice value will only adjust priorities of processes within the same WLM class. Because LoadLeveler defines a single class for every job step, the nice value as no effect.

For more information, consult the appropriate UNIX documentation.

NQS_class =  true |false
When true, any job submitted to this class will be routed to an NQS machine.

NQS_submit = name
Where name is the name of the NQS pipe queue to which the job will be routed. When the job is dispatched to LoadLeveler, LoadLeveler will invoke the qsub command using the name of this queue. There is no default.

NQS_query = queue names
Where queue names is a blank-delimited list of queue names (including host names if necessary) to be used with the qstat command to monitor the job and with the qdel command to cancel the job. There is no default.

For more information on routing jobs to machines running NQS, refer to Figure 18

priority = number
Where number is an integer that specifies the priority for jobs in this class. The default is 0. The number specified for priority is referenced as ClassSysprio in the configuration file. You can use ClassSysprio when assigning job priorities. If the variable ClassSysprio does not appear in the SYSPRIO expression, then the priority specified here in the administration file is ignored. See Step 6: Prioritize the queue maintained by the negotiator for more information about the ClassSysprio keyword.

total_tasks = number
Where number specifies the maximum number of tasks a user submitting jobs in this class can request for a parallel job in a job command file using the total_tasks keyword. The default is -1, which means there is no limit.

Limit keywords

The class stanza includes the following limit keywords, which allow you to control the amount of resources used by a job step or a job process.

Table 17. Types of limit keywords

Limit How It Is Enforced
ckpt_time_limit Per job step
core_limit Per process
cpu_limit Per process
data_limit Per process
file_limit Per process
job_cpu_limit Per job step
rss_limit Per process
stack_limit Per process
wall_clock_limit Per job step

Individual keywords are described in Specifying limits in the class stanza. The following section gives you a general overview of limits.

Overview of limits

A limit is the amount of a resource that a job step or a process is allowed to use. (A process is a dispatchable unit of work.) A job step may be made up of several processes.

Limits include both a hard limit and a soft limit. When a hard limit is exceeded, the job is usually terminated. When a soft limit is exceeded, the job is usually given a chance to perform some recovery actions. For more information, see Exceeding limits.

Limits are enforced either per process or per job step, depending on the type of limit. For parallel jobs steps, which consist of multiple tasks running on multiple machines, limits are enforced on a per task basis.

For example, a common limit is the cpu_limit, which limits the amount of CPU time a single process can use. If you set cpu_limit to five hours and you have a job step that forks five processes, each process can use up to five hours of CPU time, for a total of 25 CPU hours. Another limit that controls the amount of CPU used is job_cpu_limit. For a serial job step, if you impose a job_cpu_limit of five hours, the entire job step (made up of all five processes) cannot consume more than five CPU hours. For information on using this keyword with parallel jobs, see job_cpu_limit.

You can specify limits in either the class stanza of the administration file or in the job command file. The lower of these two limits will be used to run the job even if the system limit for the user is lower.

Exceeding limits

Process limits are enforced by the operating system. Job step limits are enforced by LoadLeveler.

Exceeding job step limits

When a hard limit is exceeded LoadLeveler sends a non-trappable signal to the process (except in the case of a parallel job). When a soft limit is exceeded, LoadLeveler sends a trappable signal to the process. The following chart summarizes the actions that occur when a job step limit is exceeded:

Table 18. Exceeding job step limits

Type of Job When a Soft Limit is Exceeded When a Hard Limit is Exceeded
Serial SIGXCPU or SIGKILL issued SIGKILL issued
Parallel (non-PVM) SIGXCPU issued to both the user program and to the parallel daemon SIGTERM issued
PVM SIGXCPU issued to the user program pvm_halt invoked to shut down PVM

On systems that do not support SIGXCPU, LoadLeveler does not distinguish between hard and soft limits. When a soft limit is reached on these platforms, LoadLeveler issues a SIGKILL.

Exceeding per process limits

For per process limits, what happens when your job reaches and exceeds either the soft limit or the hard limit depends on the operating system you are using.

Note that when a job forks a process which exceeds a per process limit, such as the CPU limit, the operating system (and not LoadLeveler) terminates the process by issuing a SIGXCPU. As a result, you will not see an entry in the LoadLeveler logs indicating that the process exceeded the limit. The job will complete with a 0 return code. LoadLeveler can only report the status of any processes it has started.

If you need more specific information, refer to your operating system documentation.

Syntax

The syntax for setting a limit is

limit_type = hardlimit,softlimit

For example:

core_limit = 120kb,100kb

To specify only a hard limit, you can enter, for example:

core_limit = 120kb

To specify only a soft limit, you can enter, for example:

core_limit = ,100kb

In a keyword statement, you cannot have any blanks between the numerical value (100 in the above example) and the units (kb). Also, you cannot have any blanks to the left or right of the comma when you define a limit in a job command file.

For limit keywords that refer to a data limit -- such as data_limit, core_limit, file_limit, stack_limit, and rss_limit -- the hard limit and the soft limit are expressed as:

integer[.fraction][units]

The allowable units for these limits are:

b bytes
w words
kb kilobytes (2**10 bytes)
kw kilowords (2**12 bytes)
mb megabytes (2**20 bytes)
mw megawords (2**22 bytes)
gb gigabytes (2**30 bytes)
gw gigawords (2**32 bytes)
tb terabytes (2**40 bytes)
tw terawords (2**42 bytes)
pb petabytes (2**50 bytes)
pw petawords (2**52 bytes)
eb exabytes  (2**60 bytes)
ew exawords  (2**62 bytes)

If no units are specified for data limits, then bytes are assumed.

For limit keywords that refer to a time limit -- such as ckpt_time_limit, cpu_limit, job_cpu_limit, and wall_clock_limit -- the hard limit and the soft limit are expressed as:

[[hours:]minutes:]seconds[.fraction]

Fractions are rounded to seconds.

You can use the following character strings with all limit keywords except the copy keyword for wall_clock_limit, job_cpu_limit and ckpt_time_limit:

rlim_infinity
Represents the largest positive number.
unlimited
Has same effect as rlim_infinity.
copy
Uses the limit currently active when the job is submitted.

See Table 19 for more information on specifying limits.

Table 19. Setting limits

If the hard limit: Then the:
Is set in both the class stanza and the job command file Smaller of the two limits is taken into consideration. If the smaller limit is the job limit, the job limit is then compared with the user limit set on the machine that runs the job. The smaller of these two values is used. If the limit used is the class limit, the class limit is used without being compared to the machine limit.
Is not set in either the class stanza or the job command file User per process limit set on the machine that runs the job is used.
Is set in the job command file and is less than its respective job soft limit The job is not submitted.
Is set in the class stanza and is less than its respective class stanza soft limit Soft limit is adjusted downward to equal the hard limit.
Is specified in the job command file Hard limit must be greater than or equal to the specified soft limit and less than or equal to the limit set by the administrator in the class stanza of the administration file.

Note: If the per process limit is not defined in the administration file and the hard limit defined by the user in the job command file is greater than the limit on the executing machine, then the hard limit is set to the machine limit.

Specifying limits in the class stanza

You can specify the following limit keywords:

ckpt_time_limit = hardlimit,softlimit
Where hardlimit,softlimit defines the maximum time that checkpointing a job can take. When LoadLeveler detects that the softlimit has been exceeded, it attempts to abort the checkpoint and allow the job to continue. If this is not possible, and the hard limit is exceeded, LoadLeveler will terminate the job. The start time of the checkpoint is defined as the time when the Startd daemon receives status from the starter that a checkpoint has started.

Examples:

ckpt_time_limit = 30:45          #hardlimit - 30 minutes 45 seconds
ckpt_time_limit = 30:45,25:00    #hardlimit - 30 minutes 44 seconds
                                 #soflimit  - 25 minutes

core_limit = hardlimit,softlimit
Specifies the hard limit, soft limit, or both for the size of a core file.

Examples:

core_limit = unlimited
core_limit = 30mb

For more information, see Overview of limits

cpu_limit = hardlimit,softlimit
Specifies hard limit, soft limit, or both for the CPU time to be used by each individual process of a job step. For example, if you impose a cpu_limit of five hours and you have a job step composed of five processes, each process can consume five CPU hours; the entire job step can therefore consume 25 total hours of CPU.

Examples:

cpu_limit = 12:56:21       # hardlimit = 12 hours 56 minutes 21 seconds
cpu_limit = 56:00,50:00    # hardlimit = 56 minutes 0 seconds 
                           # softlimit = 50 minutes 0 seconds
cpu_limit = 1:03           # hardlimit = 1 minute 3 seconds
cpu_limit = unlimited      # hardlimit = 2,147,483,647 seconds 
                           # (X'7FFFFFFF')
cpu_limit = rlim_infinity  # hardlimit = 2,147,483,647 seconds 
                           # (X'7FFFFFFF')
cpu_limit = copy           # current CPU hardlimit value on the 
                           # submitting machine.

For more information, see Overview of limits.

data_limit = hardlimit,softlimit
Specifies hard limit, soft limit, or both for the data segment to be used by each process of the submitted job.

Examples:

data_limit = 125621         # hardlimit = 125621 bytes
data_limit = 5621kb         # hardlimit = 5621 kilobytes
data_limit = 2mb            # hardlimit = 2 megabytes
data_limit = 2.5mw          # hardlimit = 2.5 megawords
data_limit = unlimited      # hardlimit = 9,223,372,036,854,775,807 bytes
                            # (X'7FFFFFFFFFFFFFFF')
data_limit = rlim_infinity  # hardlimit = 9,223,372,036,854,775,807 bytes 
                            # (X'7FFFFFFFFFFFFFFF')
data_limit = copy           # copy data hardlimit value from submitting machine.

For more information, see Overview of limits.

file_limit = hardlimit,softlimit
Specifies the hard limit, soft limit, or both for the size of a file. For more information, see Overview of limits.

job_cpu_limit = hardlimit,softlimit
Specifies the maximum total CPU time to be used by all processes of a job step.

For example:

job_cpu_limit = 10000

For more information on this keyword, see:

rss_limit = hardlimit,softlimit
Specifies the hard limit, soft limit, or both for the resident size. For more information, see Overview of limits.

stack_limit = hardlimit,softlimit
Specifies the hard limit, soft limit, or both for the size of a stack. For more information, see Overview of limits.

wall_clock_limit = hardlimit,softlimit
Specifies the hard limit, soft limit, or both for the elapsed time for which a job can run. Note that LoadLeveler uses the time the negotiator daemon dispatches the job as the start time of the job. When a job is checkpointed, vacated, and then restarted, the wall_clock_limit is not adjusted to account for the amount of time that elapsed before the checkpoint occurred. This keyword is not supported for NQS jobs.

If you are running the Backfill or Gang scheduler, you must set a wall clock limit either in the job command file or in a class stanza (for the class associated with the job you submit). LoadLeveler administrators should consider setting a default wall clock limit in a default class stanza. For more information on setting a wall clock limit when using the Backfill or Gang scheduler, see Choosing a scheduler.

For more general information on limits, see Overview of limits.

Examples of class stanzas

Example 1: Creating a class that excludes certain users

class_a: type=class                # class that excludes users
priority=10                        # ClassSysprio
exclude_users=green judy           # Excluded users

Example 2: Creating a class for small-size jobs

small:  type=class                               # class for small jobs
priority=80                                      # ClassSysprio (max=100)
cpu_limit=00:02:00                               # 2 minute limit
data_limit=30mb                                  # max 30 MB data segment
default_resources=ConsumbableVirtualMemory(10mb) # resources consumed by each 
ConsumableCpus(1) resA(3) floatinglicenseX(1)    # task of a small job step if 
                                                 # resources are not explicitly 
                                                 # specified in the job command file
ckpt_time_limit=3:00,2:00                        # 3 minute hardlimit, 2 minute softlimit 
core_limit=10mb                                  # max 10 MB core file
file_limit=50mb                                  # max file size 50 MB
stack_limit=10mb                                 # max stack size 10 MB
rss_limit=35mb                                   # max resident set size 35 MB
include_users = bob sally                        # authorized users

Example 3: Creating a class for medium-size jobs

medium: type=class            # class for medium jobs
priority=70                   # ClassSysprio
cpu_limit=00:10:00            # 10 minute run time limit
data_limit=80mb,60mb          # max 80 MB data segment
                              # soft limit 60 MB data segment
ckpt_time_limit=5:00,4:30     # 5 minute hardlimit, 4 minute 30 second softlimit to checkpoint 
core_limit=30mb               # max 30 MB core file
file_limit=80mb               # max file size 80 MB
stack_limit=30mb              # max stack size 30 MB
rss_limit=100mb               # max resident set size 100 MB
job_cpu_limit=1800,1200       # hard limit is 30 minutes,
                              # soft limit is 20 minutes

Example 4: Creating a class for large-size jobs

large:  type=class             # class for large jobs
priority=60                    # ClassSysprio
cpu_limit=00:10:00             # 10 minute run time limit
data_limit=120mb               # max 120 MB data segment
default_resources=ConsumableVirtualMemory(40mb)         # resources consumed by each 
ConsumableCpus(2) resA(8) floatinglicenseX(1) resB(1)   # task of a large job step if 
                               # resources are not explicitly 
                               # specified in the job command file
ckpt_time_limit=7:00,5:00      # 7 minute hardlimit, 5 minute softlimit to checkpoint 
core_limit=30mb                # max 30 MB core file
file_limit=120mb               # max file size 120 MB
stack_limit=unlimited          # unlimited stack size
rss_limit=150mb                # max resident set size 150 MB
job_cpu_limit = 3600,2700      # hard limit 60 minutes
                               # soft limit 45 minutes
wall_clock_limit=12:00:00,11:59:55 # hard limit is 12 hours
 

Example 5: Creating a class to route jobs to NQS machines

nqs:   type=class               # class for NQS jobs
NQS_class=true
NQS_submit=pipe_queue           # NQS pipe queue name
NQS_query=one two three         # list of queue names

You can use the class names in control expressions in both the global and local configuration file.

Example 6: Creating a class for PVM jobs

PVM3:  type=class             # class for PVM jobs
priority=60                   # ClassSysprio (max=100)
max_processors=15             # maximum number of processors

Example 7: Creating a class for master node machines

sp-6hr-sp:  type=class         # class for master node machines
priority=50                    # ClassSysprio (max=100)
ckpt_time_limit=25:00,20:00    # 25 minute hardlimit, 20 minute softlimit to checkpoint 
cpu_limit = 06:00:00           # 6 hour limit
job_cpu_limit = 06:00:00       # hard limit is 6 hours
core_limit = lmb               # max 1MB core file
master_node_requirement = true # master node definition


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]