IBM Books

IBM LoadLeveler for AIX 5L: Using and Administering

Step 17: Specify additional configuration file keywords

This section describes keywords that were not mentioned in the previous configuration steps. Unless your installation has special requirements for any of these keywords, you can use them with their default settings.

Note:For the keywords listed below which have a number as the value on the right side of the equal sign, that number must be a numerical value and cannot be an arithmetic expression.

ACTION_ON_MAX_REJECT = HOLD | SYSHOLD | CANCEL
Specifies the state in which jobs are placed when their rejection count has reached the value of the MAX_JOB_REJECT keyword. HOLD specifies that jobs are placed in User Hold status; SYSHOLD specifies that jobs are placed in System Hold status; CANCEL specifies that jobs are canceled. The default is HOLD. When a job is rejected, LoadLeveler sends a mail message stating why the job was rejected.

ACTION_ON_SWITCH_TABLE_ERROR = program
Where program is an administrator-supplied program that will be run when DRAIN_ON_SWITCH_TABLE_ERROR is set to true and a switch table unload error occurs. The default is to not run a program.

AFS_GETNEWTOKEN = myprog
Where myprog is an administrator supplied program that, for example, can be used to refresh an AFS token. The default is to not run a program.

For more information, see Handling an AFS token.

DCE_AUTHENTICATION_PAIR = program1, program2
Where program1 and program2 are LoadLeveler or installation supplied programs that are used to authenticate DCE security credentials. program1 obtains a handle (an opaque credentials object), at the time the job is submitted, which is used to authenticate to DCE. program2 is the path name of a LoadLeveler or installation supplied program that uses the handle obtained by program1 to authenticate to DCE before starting the job on the executing machine(s).

For more information on DCE security credentials, see Handling DCE security credentials.

DRAIN_ON_SWITCH_TABLE_ERROR = true | false
When DRAIN_ON_SWITCH_TABLE_ERROR is set to true, the startd will be drained when the switch table fails to unload. This will flag the administrator that intervention may be required to unload the switch table. The default is false.

HISTORY_PERMISSION = permissions | rw-rw----
Permissions value of this keyword specifies the owner, group, and world permissions of the history file associated with a LoadL_schedd daemon. It must be a string with a length of nine characters and consisting of the characters, r, w, x, or -. The default is rw-rw----. LoadL_schedd will use the default setting if the specified permission are less than rw-------.

MACHINE_UPDATE_INTERVAL = number
Where number specifies the time period, in seconds, during which machines must report to the central manager. Machines that do not report in this number of seconds are considered down. The default is 300 seconds.

MAX_JOB_REJECT = number
Where number specifies the number of times a job can be rejected before it is removed (canceled) or put in User Hold or System Hold status. That is, a rejected job is redispatched until the MAX_JOB_REJECT value is reached. The default is -1, meaning a job is redispatched an unlimited number of times. A job that cannot run for various reasons (such as a uid mismatch, unavailable resources, or wrong permissions) on one machine will be rejected on that machine, and LoadLeveler will attempt to run the job on another machine. A value of 0 means that if the job is rejected, it is immediately removed. (For related information, see the NEGOTIATOR_REJECT_DEFER keyword in this section.)

NEGOTIATOR_INTERVAL = number
Where number specifies the interval, in seconds, at which the negotiator daemon performs a "negotiation loop" during which it attempts to assign available machines to waiting jobs. A negotiation loop also occurs whenever job states or machine states change. The default is 30 seconds.

NEGOTIATOR_CYCLE_DELAY = number
Where number specifies the time, in seconds, the negotiator delays between periods when it attempts to schedule jobs. This time is used by the negotiator daemon to respond to queries, reorder job queues, collect information about changes in the states of jobs, etc. Delaying the scheduling of jobs might improve the overall performance of the negotiator by preventing it from spending excessive time attempting to schedule jobs. The NEGOTIATOR_CYCLE_DELAY must be less than the NEGOTIATOR_INTERVAL. The default is 0 seconds.

NEGOTIATOR_LOADAVG_INCREMENT = number
Where number specifies the value the negotiator adds to the startd machine's load average whenever a job in the Pending state is queued on that machine. This value is used to compensate for the increased load caused by starting another job. The default value is .5.

NEGOTIATOR_PARALLEL_DEFER = number
Where number specifies the amount of time in seconds that defines how long a job stays out of the queue after it fails to get the correct number of processors. This keyword applies only to the default LoadLeveler scheduler. This keyword must be greater than the NEGOTIATOR_INTERVAL. value; if it is not, the default is used. The default, set internally by LoadLeveler, is NEGOTIATOR_INTERVAL multiplied by 5.

NEGOTIATOR_PARALLEL_HOLD = number
Where number specifies the amount of time in seconds that defines how long a job is given to accumulate processors. This keyword applies only to the default LoadLeveler scheduler. This keyword must be greater than the NEGOTIATOR_INTERVAL value; if it is not, the default is used. The default, set internally by LoadLeveler, is NEGOTIATOR_INTERVAL multiplied by 5.

NEGOTIATOR_RECALCULATE_SYSPRIO_INTERVAL = number
Where number specifies the amount of time in seconds between calculation of the SYSPRIO values for waiting jobs. The default is 120 seconds. Recalculating the priority can be CPU-intensive; specifying low values for the NEGOTIATOR_RECALCULATE_SYSPRIO_INTERVAL keyword may lead to a heavy CPU load on the negotiator if a large number of jobs are running or waiting for resources. A value of 0 means the SYSPRIO values are not recalculated.

You can use this keyword to base the order in which jobs are run on the current number of running, queued, or total jobs for a user or a group. For more information, see Step 6: Prioritize the queue maintained by the negotiator.

NEGOTIATOR_REJECT_DEFER = number
Where number specifies the amount of time in seconds the negotiator waits before it considers scheduling a job to a machine that recently rejected the job. The default is 120 seconds. (For related information, see the MAX_JOB_REJECT keyword in this section.)

NEGOTIATOR_REMOVE_COMPLETED = number
Where number is the amount of time in seconds that you want the negotiator to keep information regarding completed and removed jobs so that you can query this information using the llq command. The default is 0 seconds.

NEGOTIATOR_RESCAN_QUEUE = number
Where number specifies the amount of time in seconds that defines how long the negotiator waits to rescan the job queue for machines which have bypassed jobs which could not run due to conditions which may change over time. This keyword must be greater than the NEGOTIATOR_INTERVAL value; if it is not, the default is used. The default is 900 seconds.

OBITUARY_LOG_LENGTH = number
Where number specifies the number of lines from the end of the file that are appended to the mail message. The master daemon mails this log to the LoadLeveler administrators when one of the daemons dies. The default is 25.

POLLING_FREQUENCY = number
Where number specifies the interval, in seconds, with which the startd daemon evaluates the load on the local machine and decides whether to suspend, resume, or abort jobs. This is also the minimum interval at which the kbdd daemon reports keyboard or mouse activity to the startd daemon. A value of 5 is the default.

POLLS_PER_UPDATE = number
Where number specifies how often, in POLLING_FREQUENCY intervals, startd daemon updates the central manager. Due to the communication overhead, it is impractical to do this with the frequency defined by the POLLING_FREQUENCY keyword. Therefore, the startd daemon only updates the central manager every nth (where n is the number specified for POLLS_PER_UPDATE) local update. Change POLLS_PER_UPDATE when changing the POLLING_FREQUENCY. The default is 24.

PUBLISH_OBITUARIES = true | false
Where true specifies that the master daemon sends mail to the administrator(s), identified by LOADL_ADMIN keyword, when any of the daemons it manages dies abnormally.

RESTARTS_PER_HOUR = number
Where number specifies how many times the master daemon attempts to restart a daemon that dies abnormally. Because one or more of the daemons may be unable to run due to a permanent error, the master only attempts $(RESTARTS_PER_HOUR) restarts within a 60 minute period. Failing that, it sends mail to the administrator(s) identified by the LOADL_ADMIN keyword and exits. The default is 12.

SCHEDD_INTERVAL = number
Where number specifies the interval, in seconds, at which the schedd daemon checks the local job queue and updates the negotiator daemon. The default is 60 seconds.

VM_IMAGE_ALGORITHM = FREE_PAGING_SPACE | FREE_PAGING_SPACE_PLUS_FREE_REAL_MEMORY
Specifies which algorithm the Central Manager uses to decide whether a machine has enough virtual memory to meet the image_size requirement of a job step. The default for this keyword is FREE_PAGING_SPACE. The LoadLeveler computed values of free real memory and free paging space should track very closely with the values reported by the svmon -G command.

If VM_IMAGE_ALGORITHM = FREE_PAGING_SPACE_PLUS_FREE_REAL_MEMORY and all other requirements are met, then a machine having 1.2 GB of free real memory and 0.9 GB of free paging space (2.1 GB free) should be able to start a job step that has an image size requirement of 2 GB (image_size = 2000000).

WALLCLOCK_ENFORCE = true | false
Where true specifies that the wall_clock_limit on the job will be enforced. The WALLCLOCK_ENFORCE keyword is only valid when the External Scheduler is enabled.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]