Use the following keywords to define the characteristics of the LoadLeveler cluster:
When set to true, every communication between LoadLeveler processes will verify that the sending process is running on a machine which is identified via a machine stanza in the administration file. The validation is done by capturing the address of the sending machine when the accept function call is issued to accept a connection. The gethostbyaddr function is called to translate the address to a name, and the name is matched with the list derived from the administration file.
| Note: | MACHINE_AUTHENTICATE must be set as "true" for Gang scheduling to work. For more information see Restrictions for Gang scheduling and preemption. |
This section discusses the types of schedulers available and the keywords (SCHEDULER_TYPE and SCHEDULER_API) used to define which scheduler LoadLeveler will use.
Use the following keywords to define your scheduler:
Notes:
The SCHEDULER_TYPE definitions are:
| Note: | If you change the scheduler from a specified SCHEDULER_TYPE to SCHEDULER_API=YES, you must stop and restart LoadLeveler using llctl. |
See Keyword considerations for parallel jobs for information on which keywords associated with parallel jobs are supported by the default scheduler.
For example: on a rack with 10 nodes, 8 of the nodes are being used by Job A. Job B has the highest priority in the queue, and requires 10 nodes. Job C has the next highest priority in the queue, and requires only two nodes. Job B has to wait for Job A to finish so that it can use the freed nodes. Because Job A is only using 8 of the 10 nodes, the Backfill scheduler can schedule Job C (which only needs the two available nodes) to run as long as it finishes before Job A finishes (and Job B starts). To determine whether or not Job C has time to run, the Backfill scheduler uses Job C's wall_clock_limit value to determine whether or not it will finish before Job A ends. If Job C has a wall_clock_limit of unlimited, it may not finish before Job B's start time, and it won't be dispatched.
The Backfill scheduler supports:
The above functions are not supported by the default LoadLeveler scheduler.
Note the following when using the Backfill scheduler:
See Keyword considerations for parallel jobs for information on which keywords associated with parallel jobs are supported by the Backfill scheduler.
For more information on setting up Gang scheduling, see Using Gang scheduling.
You can use the file system keywords to monitor the file system space used by LoadLeveler for:
You can also use the file system keywords to take preventive action and avoid problems caused by running out of file system space. This is done by setting the frequency that LoadLeveler checks the file system free space and by setting the upper and lower thresholds that initialize system responses to the free space available. By setting a realistic span between the lower and upper thresholds, you will avoid excessive system actions.
| Note: | If FS_INTERVAL is not specified but any of the other three keywords (FS_NOTIFY, FS_SUSPEND, or FS_TERMINATE) are specified, the FS_INTERVAL value will default to 5 and the file system will be checked. |
This configuration file keyword defines when LoadLeveler notifies the administrator that there is a file system problem or that a file system problem has been resolved.
If the free space associated with the LoadLeveler file system drops below the lower threshold, LoadLeveler sends a mail message to the administrator indicating that logging problems may occur. When file system free space rises above the upper threshold (after passing the lower threshold), LoadLeveler sends a mail message to the administrator indicating that problem has been resolved.
Default value (in blocks): 1000, -1
The valid range for both the lower and upper thresholds are -1 and all positive integers. If the value is set to -1, the transition across the threshold is not checked.
This configuration file keyword defines when LoadLeveler drains and resumes the schedd and startd daemons running on a node.
If the free space associated with the LoadLeveler file system drops below lower threshold, LoadLeveler drains the schedd and the startd daemons if they are running on a node. When this happens, logging is turned off and mail notification is sent to the administrator.
When file system free space rises above the upper threshold (after passing the lower threshold), LoadLeveler signals the schedd and the startd daemons to resume. When this happens, logging is turned on and mail notification is sent to the administrator.
Default value (in blocks): -1, -1
The valid range for both the lower and upper thresholds are -1 and all positive integers. If the value is set to -1, the transition across the threshold is not checked.
This keyword sends the SIGTERM signal to the Master daemon which then terminates all LoadLeveler daemons running on the node.
If the free space associated with the LoadLeveler file system drops below lower threshold, all LoadLeveler daemons are terminated.
| Note: | Although the upper threshold setting for FS_TERMINATE is ignored when LoadLeveler is terminated, the upper threshold is still required on the statement. |
Default value (in blocks): -1, -1
The valid range for the lower thresholds is -1 and all positive integers. If the value is set to -1, the transition across the threshold is not checked.