Figure 4 illustrates the information flow through the LoadLeveler cluster:
Figure 4. High-level job flow
View figure.
The managing machine in a LoadLeveler cluster is known as the central manager. There are also machines that act as schedulers, and machines that serve as the executing machines. The arrows in Figure 4 illustrate the following:
Figure 4 is broken down into the following more detailed diagrams illustrating how LoadLeveler processes a job.
Figure 5. Job is submitted to LoadLeveler
View figure.
Figure 5 illustrates that the schedd daemon runs on the scheduling machine. This machine can also have the startd daemon running on it. The negotiator daemon resides on the central manager machine. The arrows in Figure 5 illustrate the following:
Figure 6. LoadLeveler authorizes the job
View figure.
In Figure 6, arrow 4 indicates that the negotiator daemon authorizes the schedd daemon to begin taking steps to run the job. This authorization is called a permit to run. Once this is done, the job is considered Pending or Starting. (See LoadLeveler job states for more information.)
Figure 7. LoadLeveler prepares to run the job
View figure. |
In Figure 7, arrow 5 illustrates that the schedd daemon contacts the startd daemon on the executing machine and requests that it start the job. The executing machine can either be a local machine (the machine from which the job was submitted) or a remote machine (another machine in the cluster).
Figure 8. LoadLeveler starts the job
View figure.
The arrows in Figure 8 illustrate the following:
The starter forks and executes the user's job, and the starter parent waits for the child to complete.
Figure 9. LoadLeveler completes the job
View figure.
The arrows in Figure 9 illustrate the following:
As LoadLeveler processes a job, the job moves through various states. Possible job states are listed in Table 2 and detailed in the appendix under Job states. For more information about the daemons controlling these job states see Daemons.
| Job state | Abbreviation | Details on page: | ||
|---|---|---|---|---|
| Canceled | CA | *** | ||
| Checkpointing | CK | *** | ||
| Completed | C | *** | ||
| Complete Pending | CP | *** | ||
| Deferred | D | *** | ||
| Idle | I | *** | ||
| Not Queued | NQ | *** | ||
| Not Run | NR | *** | ||
| Pending | P | *** | ||
| Preempted | E | *** | ||
| Preempt Pending | EP | *** | ||
| Rejected | X | *** | ||
| Reject Pending | XP | *** | ||
| Removed | RM | *** | ||
| Remove Pending | RP | *** | ||
| Resume Pending | MP | *** | ||
| Running | R | *** | ||
| Starting | ST | *** | ||
| System Hold | S | *** | ||
| User & System Hold | HS | *** | ||
| Terminated | TX | *** | ||
| User Hold | H | *** | ||
| Vacated | V | *** | ||
| Vacate Pending | VP | *** | ||
| ||||