This section describes keywords that were not mentioned in the previous
configuration steps. Unless your installation has special requirements
for any of these keywords, you can use them with their default
settings.
- ACTION_ON_MAX_REJECT = HOLD | SYSHOLD | CANCEL
- Specifies the state in which jobs are placed when their rejection count
has reached the value of the MAX_JOB_REJECT keyword. HOLD
specifies that jobs are placed in User Hold status; SYSHOLD specifies
that jobs are placed in System Hold status; CANCEL specifies that jobs
are canceled. The default is HOLD. When a job is rejected,
LoadLeveler sends a mail message stating why the job was rejected.
- ACTION_ON_SWITCH_TABLE_ERROR = program
- Where program is an administrator-supplied program that will be run when
DRAIN_ON_SWITCH_TABLE_ERROR is set to true and a switch
table unload error occurs. The default is to not run a program.
- AFS_GETNEWTOKEN = myprog
- Where myprog is an administrator supplied program that, for
example, can be used to refresh an AFS token. The default is to not run
a program.
For more information, see Handling an AFS token.
- DCE_AUTHENTICATION_PAIR = program1, program2
- Where program1 and program2 are LoadLeveler or installation supplied
programs that are used to authenticate DCE security credentials.
program1 obtains a handle (an opaque credentials object), at the time the job
is submitted, which is used to authenticate to DCE. program2 is the
path name of a LoadLeveler or installation supplied program that uses the
handle obtained by program1 to authenticate to DCE before starting the job on
the executing machine(s).
For more information on DCE security credentials, see Handling DCE security credentials.
- DRAIN_ON_SWITCH_TABLE_ERROR = true | false
- When DRAIN_ON_SWITCH_TABLE_ERROR is set to true, the
startd will be drained when the switch table fails to
unload. This will flag the administrator that intervention may be
required to unload the switch table. The default is
false.
- HISTORY_PERMISSION = permissions | rw-rw----
- Permissions value of this keyword specifies the owner, group,
and world permissions of the history file associated with a LoadL_schedd
daemon. It must be a string with a length of nine characters and
consisting of the characters, r, w, x, or
-. The default is rw-rw----. LoadL_schedd
will use the default setting if the specified permission are less than
rw-------.
- MACHINE_UPDATE_INTERVAL = number
- Where number specifies the time period, in seconds, during which
machines must report to the central manager. Machines that do not
report in this number of seconds are considered down. The
default is 300 seconds.
- MAX_JOB_REJECT = number
- Where number specifies the number of times a job can be rejected
before it is removed (canceled) or put in User Hold or System Hold
status. That is, a rejected job is redispatched until the
MAX_JOB_REJECT value is reached. The default is -1, meaning
a job is redispatched an unlimited number of times. A job that cannot
run for various reasons (such as a uid mismatch, unavailable
resources, or wrong permissions) on one machine will be rejected on that
machine, and LoadLeveler will attempt to run the job on another
machine. A value of 0 means that if the job is rejected, it is
immediately removed. (For related information, see the
NEGOTIATOR_REJECT_DEFER keyword in this section.)
- NEGOTIATOR_INTERVAL = number
- Where
number specifies the interval, in seconds, at which the negotiator
daemon performs a "negotiation loop" during which it attempts to
assign available machines to waiting jobs. A negotiation loop also
occurs whenever job states or machine states change. The default is 30
seconds.
- NEGOTIATOR_CYCLE_DELAY = number
- Where number specifies the time, in seconds, the negotiator
delays between periods when it attempts to schedule jobs. This time is
used by the negotiator daemon to respond to queries, reorder job queues,
collect information about changes in the states of jobs, etc. Delaying
the scheduling of jobs might improve the overall performance of the negotiator
by preventing it from spending excessive time attempting to schedule
jobs. The NEGOTIATOR_CYCLE_DELAY must be less than the
NEGOTIATOR_INTERVAL. The default is 0 seconds.
- NEGOTIATOR_LOADAVG_INCREMENT = number
- Where number specifies the value the negotiator adds to the
startd machine's load average whenever a job in the Pending state is
queued on that machine. This value is used to compensate for the
increased load caused by starting another job. The default value is
.5.
- NEGOTIATOR_PARALLEL_DEFER = number
- Where number specifies the amount of time in seconds that defines
how long a job stays out of the queue after it fails to get the correct number
of processors. This keyword applies only to the default LoadLeveler
scheduler. This keyword must be greater than the
NEGOTIATOR_INTERVAL. value; if it is not, the default
is used. The default, set internally by LoadLeveler, is
NEGOTIATOR_INTERVAL multiplied by 5.
- NEGOTIATOR_PARALLEL_HOLD = number
- Where number specifies the amount of time in seconds that defines
how long a job is given to accumulate processors. This keyword applies
only to the default LoadLeveler scheduler. This keyword must be greater
than the NEGOTIATOR_INTERVAL value; if it is not, the default
is used. The default, set internally by LoadLeveler, is
NEGOTIATOR_INTERVAL multiplied by 5.
- NEGOTIATOR_RECALCULATE_SYSPRIO_INTERVAL = number
- Where number specifies the amount of time in seconds between
calculation of the SYSPRIO values for waiting jobs. The
default is 120 seconds. Recalculating the priority can be
CPU-intensive; specifying low values for the
NEGOTIATOR_RECALCULATE_SYSPRIO_INTERVAL keyword may lead to a heavy
CPU load on the negotiator if a large number of jobs are running or
waiting for resources. A value of 0 means the SYSPRIO values
are not recalculated.
You can use this keyword to base the order in which jobs are run on the
current number of running, queued, or total jobs for a user or a group.
For more information, see Step 6: Prioritize the queue maintained by the negotiator.
- NEGOTIATOR_REJECT_DEFER = number
- Where number specifies the amount of time in seconds the
negotiator waits before it considers scheduling a job to a machine that
recently rejected the job. The default is 120 seconds. (For
related information, see the MAX_JOB_REJECT keyword in this
section.)
- NEGOTIATOR_REMOVE_COMPLETED = number
- Where number is the amount of time in seconds that you want the
negotiator to keep information regarding completed and removed jobs so that
you can query this information using the llq command. The
default is 0 seconds.
- NEGOTIATOR_RESCAN_QUEUE = number
- Where number specifies the amount of time in seconds that defines
how long the negotiator waits to rescan the job queue for machines which have
bypassed jobs which could not run due to conditions which may change over
time. This keyword must be greater than the
NEGOTIATOR_INTERVAL value; if it is not, the default is
used. The default is 900 seconds.
- OBITUARY_LOG_LENGTH = number
- Where number specifies the number of lines from the end of the
file that are appended to the mail message. The master daemon mails
this log to the LoadLeveler administrators when one of the daemons
dies. The default is 25.
- POLLING_FREQUENCY = number
- Where number specifies the interval, in seconds, with which the
startd daemon evaluates the load on the local machine and decides whether to
suspend, resume, or abort jobs. This is also the minimum interval at
which the kbdd daemon reports keyboard or mouse activity to the startd
daemon. A value of 5 is the default.
- POLLS_PER_UPDATE = number
- Where number specifies how often, in POLLING_FREQUENCY
intervals, startd daemon updates the central manager. Due to the
communication overhead, it is impractical to do this with the frequency
defined by the POLLING_FREQUENCY keyword. Therefore, the
startd daemon only updates the central manager every nth (where
n is the number specified for POLLS_PER_UPDATE) local
update. Change POLLS_PER_UPDATE when changing the
POLLING_FREQUENCY. The default is 24.
- PUBLISH_OBITUARIES = true | false
- Where true specifies that the master daemon sends mail to the
administrator(s), identified by LOADL_ADMIN keyword, when any of
the daemons it manages dies abnormally.
- RESTARTS_PER_HOUR = number
- Where number specifies how many times the master daemon attempts
to restart a daemon that dies abnormally. Because one or more of the
daemons may be unable to run due to a permanent error, the master only
attempts $(RESTARTS_PER_HOUR) restarts within a 60 minute
period. Failing that, it sends mail to the administrator(s) identified
by the LOADL_ADMIN keyword and exits. The default is
12.
- SCHEDD_INTERVAL = number
- Where number specifies the interval, in seconds, at which the
schedd daemon checks the local job queue and updates the negotiator
daemon. The default is 60 seconds.
- VM_IMAGE_ALGORITHM = FREE_PAGING_SPACE |
FREE_PAGING_SPACE_PLUS_FREE_REAL_MEMORY
- Specifies which algorithm the Central Manager uses to decide whether a
machine has enough virtual memory to meet the image_size
requirement of a job step. The default for this keyword is
FREE_PAGING_SPACE. The LoadLeveler computed values of free real memory
and free paging space should track very closely with the values reported by
the svmon -G command.
If VM_IMAGE_ALGORITHM = FREE_PAGING_SPACE_PLUS_FREE_REAL_MEMORY and all
other requirements are met, then a machine having 1.2 GB of free real
memory and 0.9 GB of free paging space (2.1 GB free) should be
able to start a job step that has an image size requirement of 2 GB
(image_size = 2000000).
- WALLCLOCK_ENFORCE = true |
false
- Where true specifies that the wall_clock_limit on
the job will be enforced. The WALLCLOCK_ENFORCE keyword is
only valid when the External Scheduler is enabled.