Purpose
Returns information about job steps in the LoadLeveler queues.
Syntax
llq [-?] [-H] [-v] [-x] [-s] [ -l] [-w] [joblist] [-u userlist] [-h hostlist] [-c classlist] [-f category_list] [-r category_list]
Flags
CPU usage and other resource consumption information on active jobs can only be reported using the -x flag if the LoadLeveler administrator has enabled it by specifying A_ON and A_DETAIL for the ACCT keyword in the LoadLeveler configuration file.
Normally, llq connects with the central manager to obtain job information. When you specify -x, llq connects to the schedd machine that received the specified job to get extended job information. However, some statistics, including those corresponding to System Priority and q_sysprio, are available only from the central manager. Do not use the -x option if you need these statistics.
When specified without -l, CPU usage for active jobs is reported in the short format.
| Note: | Using both the -l and -x options without a joblist specification can produce a very long report and excessive network traffic. |
If -l is not specified, then the standard listing is generated as shown in Results.
When the -w flag is augmented with a single stepid, the -h flag can be used in conjunction with -w to specify a single hostname.
The following statistics are displayed for every node the job is running on:
If the job was submitted from a submit-only machine, this is the name of the machine where the schedd daemon that sent the job to the negotiator resides.
If the -u or -h options are not specified, and if no jobid is specified, then all jobs are queried.
The -u and -h options override the jobid parameters.
Examples
This example generates a long listing for job 8, job step 2 submitted to machine gold:
llq -l gold.8.2
This example generates a standard listing for all job steps of job name 12 submitted to the local machine:
llq 12
Standard listing: The standard listing is generated when you do not specify the -l option with the llq command. The following is sample output from the llq -h mars command, where the machine mars has two jobs running and one job waiting:
+--------------------------------------------------------------------------------+ |Id Owner Submitted ST PRI Class Running On | |------------------------ ---------- ----------- -- --- ------------ ----------- | |mars.498.0 brownap 5/20 11:31 R 100 silver mars | |mars.499.0 brownap 5/20 11:31 R 50 No_Class mars | |mars.501.0 brownap 5/20 11:31 I 50 silver | | | |3 job step(s) in query, 1 waiting, 0 pending, 2 running, 0 held, 0 preempted | +--------------------------------------------------------------------------------+
The standard listing includes the following fields:
For a detailed explanation of job states, see Job states.
Customized, formatted standard listing: A customized and formatted standard listing is generated when you specify llq with the -f flag. The following is sample output from this command:
llq -f %id %c %dq %dd %gl %h
+--------------------------------------------------------------------------------+ |Step Id Class Queue Date Disp. Date LL Group Running On | |----------------- ---------- ----------- ----------- ---------- --------------- | |ll6.2.0 No_Class 04/08 09:19 04/08 09:21 No_Group ll6.pok.ibm.com | |ll6.1.0 No_Class 04/08 09:19 04/08 09:21 No_Group ll6.pok.ibm.com | |ll6.3.0 No_Class 04/08 09:19 04/08 09:21 No_Group ll5.pok.ibm.com | | | |3 job step(s) in queue, 0 waiting, 0 pending, 3 running, 0 held, 0 preempted | | | +--------------------------------------------------------------------------------+
Customized, unformatted standard listing: A customized and unformatted (raw) standard listing is generated when you specify llq with the -r flag. Output fields are separated by an exclamation point (!). The following is sample output from this command:
llq -r %id %c %dq %dd %gl %h
+--------------------------------------------------------------------------------+ |ll6.pok.ibm.com.2.0!No_Class!04/08/2001 09:19!04/08/2001 09:21!No_Group!ll6.pok.ibm&| |ll6.pok.ibm.com.1.0!No_Class!04/08/2001 09:19!04/08/2001 09:21!No_Group!ll6.pok.ibm&| |ll6.pok.ibm.com.3.0!No_Class!04/08/2001 09:19!04/08/2001 09:21!No_Group!ll5.pok.ibm&| +--------------------------------------------------------------------------------+
WLM CPU and real memory statistics listing: If the LoadLeveler interface to AIX Workload Manager (WLM) is enabled, then the -w option can be used to obtain CPU and real memory statistics of job steps in running state. The following is the output of "llq -w c209f1n05.13.0" where c209f1n05.13.0 is a CPU intensive parallel job step currently running on the 2 nodes c209f1n05 and c209f1n01:
+--------------------------------------------------------------------------------+ | =============== Job Step c209f1n05.ppd.pok.ibm.com.13.0 =============== | | c209f1n05.ppd.pok.ibm.com: | | Resource: CPU | | snapshot: 99 | | total: 80172 | | Resource: Real Memory | | snapshot: 1 | | high water: 2561 | | | | c209f1n01.ppd.pok.ibm.com: | | Resource: CPU | | snapshot: 100 | | total: 79303 | | Resource: Real Memory | | snapshot: 1 | | high water: 1919 | | | | | +--------------------------------------------------------------------------------+
The output listing associated with the -w option includes these fields:
The long listing: The long listing is generated when you specify the -l option with the llq command. This section contains sample output for two llq commands: one querying a serial job and one querying a parallel job. Following the sample output is an explanation of all possible fields displayed by the llq command.
The following is sample output for the llq -l command for the serial job step c209f1n01.ppd.pok.ibm.com.2.0:
Figure 21. llq -l output for a serial job step
+--------------------------------------------------------------------------------+
|=============== Job Step c209f1n01.ppd.pok.ibm.com.2.0 =============== |
| Job Step Id: c209f1n01.ppd.pok.ibm.com.2.0 |
| Job Name: c209f1n01.ppd.pok.ibm.com.2 |
| Step Name: job_step_1 |
| Structure Version: 10 |
| Owner: loadl |
| Queue Date: Wed Jul 25 15:49:17 EDT 2001 |
| Status: Running |
| Execution Factor: 1 |
| Dispatch Time: Wed Jul 25 15:49:17 EDT 2001 |
| Completion Date: |
| Completion Code: |
| User Priority: 50 |
| user_sysprio: 0 |
| class_sysprio: 35 |
| group_sysprio: 70 |
| System Priority: -33 |
| q_sysprio: -33 |
| Notifications: Complete |
| Virtual Image Size: 24 kb |
| Checkpointable: no |
| Ckpt Start Time: |
|Good Ckpt Time/Date: |
| Ckpt Elapse Time: 0 seconds |
|Fail Ckpt Time/Date: |
| Ckpt Accum Time: 0 seconds |
| Checkpoint File: |
| Restart From Ckpt: no |
| Restart Same Nodes: no |
| Restart: yes |
| Hold Job Until: |
| Cmd: /tmp/LL_V2/TEST/serial_90_60 |
| Args: arg_01 arg_02 arg_3 |
| Env: |
| In: /dev/null |
| Out: job1.c209f1n01.2.0.out |
| Err: job1.c209f1n01.2.0.err |
|Initial Working Dir: /tmp/LL_V2/TEST |
| Dependency: |
| Resources: ConsumableMemory(50.000 mb) ConsumableVirtualMemory(85.000 |
| Requirements: (Arch == "R6000") && (OpSys == "AIX51") && (Memory > 128) |
| Preferences: (Machine == { "c209f1n01.ppd.pok.ibm.com" "c209f1n05.ppd.pok.ib|
| |
+--------------------------------------------------------------------------------+
+--------------------------------------------------------------------------------+ | && (Feature == "ESSL") | | Step Type: Serial | | Min Processors: | | Max Processors: | | Allocated Host: c209f1n01.ppd.pok.ibm.com | | Node Usage: shared | | Submitting Host: c209f1n01.ppd.pok.ibm.com | | Notify User: loadl@c209f1n01.ppd.pok.ibm.com | | Shell: /bin/ksh | | LoadLeveler Group: chemistry | | Class: large | | Ckpt Hard Limit: undefined | | Ckpt Soft Limit: undefined | | Cpu Hard Limit: 02:30:00 (9000 seconds) | | Cpu Soft Limit: 02:30:00 (9000 seconds) | | Data Hard Limit: 5.500 gb (5905580032 bytes) | | Data Soft Limit: 4.400 gb (4724464025 bytes) | | Core Hard Limit: 8.000 gb (8589934592 bytes) | | Core Soft Limit: 8.000 gb (8589934592 bytes) | | File Hard Limit: 1.500 tb (1649267441664 bytes) | | File Soft Limit: 1.200 tb (1319413953331 bytes) | | Stack Hard Limit: 400.000 mb (419430400 bytes) | | Stack Soft Limit: 300.000 mb (314572800 bytes) | | Rss Hard Limit: 3.000 pb (3377699720527872 bytes) | | Rss Soft Limit: 2.000 pb (2251799813685248 bytes) | |Step Cpu Hard Limit: 04:00:30 (14430 seconds) | |Step Cpu Soft Limit: 04:00:30 (14430 seconds) | |Wall Clk Hard Limit: 00:11:40 (700 seconds) | |Wall Clk Soft Limit: 00:10:00 (600 seconds) | | Comment: Test job 1 of Serial test suite 3. | | Account: 99999 | | Unix Group: loadl | | NQS Submit Queue: | | NQS Query Queues: | |Negotiator Messages: | |Adapter Requirement: | | Step Cpus: 1 | |Step Virtual Memory: 85.000 mb | | Step Real Memory: 50.000 mb | |Step Adapter Memory: 0 bytes | | | +--------------------------------------------------------------------------------+ |
The following listing is sample output for the llq -l -x c209f1n01.1.0 command, where c209f1n01.1.0 is a parallel, non-checkpointing job step:
Figure 22. llq -l -x output for a parallel, non-checkpointing job step
+--------------------------------------------------------------------------------+ |=============== Job Step c209f1n05.ppd.pok.ibm.com.1.0 =============== | | Job Step Id: c209f1n05.ppd.pok.ibm.com.1.0 | | Job Name: c209f1n05.ppd.pok.ibm.com.1 | | Step Name: parallel_job_step_1 | | Structure Version: 10 | | Owner: loadl | | Queue Date: Wed Jul 25 17:49:51 EDT 2001 | | Status: Running | | Execution Factor: 1 | | Dispatch Time: Wed Jul 25 17:49:51 EDT 2001 | | Completion Date: | | Completion Code: | | User Priority: 50 | | user_sysprio: 0 | | class_sysprio: 45 | | group_sysprio: 0 | | System Priority: | | q_sysprio: | | Notifications: Complete | | Virtual Image Size: 387 kb | | Checkpointable: no | | Ckpt Start Time: | |Good Ckpt Time/Date: | | Ckpt Elapse Time: 0 seconds | |Fail Ckpt Time/Date: | | Ckpt Accum Time: 0 seconds | | Checkpoint File: | | Restart From Ckpt: no | | Restart Same Nodes: no | | Restart: yes | | Hold Job Until: | | Env: MANPATH=/usr/local/man:/usr/share/man: ... | | In: /dev/null | | Out: poe5_1.c209f1n05.1.0.out | | Err: poe5_1.c209f1n05.1.0.err | |Initial Working Dir: /tmp/TEST/PARA | | | +--------------------------------------------------------------------------------+ +--------------------------------------------------------------------------------+ | Dependency: | | Resources: ConsumableMemory(75.000 mb) ConsumableVirtualMemory(125.000| | Step Type: General Parallel | | Node Usage: shared | | Submitting Host: c209f1n05.ppd.pok.ibm.com | | Notify User: loadl | | Shell: /bin/ksh | | LoadLeveler Group: No_Group | | Class: Parallel | | Ckpt Hard Limit: undefined | | Ckpt Soft Limit: undefined | | Cpu Hard Limit: 00:30:00 (1800 seconds) | | Cpu Soft Limit: 00:25:00 (1500 seconds) | | Data Hard Limit: 4.250 pb (4785074604081152 bytes) | | Data Soft Limit: 1.500 tb (1649267441664 bytes) | | Core Hard Limit: 2.250 tb (2473901162496 bytes) | | Core Soft Limit: 1.250 tb (1374389534720 bytes) | | File Hard Limit: 1.200 eb (1383505805528216384 bytes) | | File Soft Limit: 1.100 eb (1268213655067531680 bytes) | | Stack Hard Limit: 40.000 mb (41943040 bytes) | | Stack Soft Limit: 30.000 mb (31457280 bytes) | | Rss Hard Limit: 1.200 eb (1383505805528216384 bytes) | | Rss Soft Limit: 5.500 pb (6192449487634432 bytes) | |Step Cpu Hard Limit: 3+08:00:00 (288000 seconds) | |Step Cpu Soft Limit: 23:59:59 (86399 seconds) | |Wall Clk Hard Limit: 01:40:00 (6000 seconds) | |Wall Clk Soft Limit: 01:40:00 (6000 seconds) | | Comment: Test job 1 of Parallel test suite 5. | | Account: 99999 | | Unix Group: loadl | | User Space Windows: 8 | | NQS Submit Queue: | | NQS Query Queues: | |Negotiator Messages: | |Adapter Requirement: (css0,LAPI,shared,US),(css0,MPI,shared,US) | | Step Cpus: 4 | |Step Virtual Memory: 500.000 mb | | Step Real Memory: 300.000 mb | |Step Adapter Memory: 8.000 mb (8388608 bytes) | +--------------------------------------------------------------------------------+ +--------------------------------------------------------------------------------+ |--------------- Detail for c209f1n05.ppd.pok.ibm.com.1.0 --------------- | | Running Host: c209f1n05.ppd.pok.ibm.com | | Machine Speed: 1.000000 | | Starter User Time: 00:00:00.230000 | |Starter System Time: 00:00:00.190000 | | Starter Total Time: 00:00:00.420000 | | Starter maxrss: 1972 | | Starter ixrss: 8788 | | Starter idrss: 13468 | | Starter isrss: 0 | | Starter minflt: 0 | | Starter majflt: 0 | | Starter nswap: 0 | | Starter inblock: 0 | | Starter oublock: 0 | | Starter msgsnd: 0 | | Starter msgrcv: 0 | | Starter nsignals: 3 | | Starter nvcsw: 82 | | Starter nivcsw: 56 | | Step User Time: 00:01:20.460000 | | Step System Time: 00:00:00.790000 | | Step Total Time: 00:01:21.250000 | | Step maxrss: 4312 | | Step ixrss: 52544 | | Step idrss: 9308828 | | Step isrss: 0 | | Step minflt: 6941 | | Step majflt: 0 | | Step nswap: 0 | | Step inblock: 0 | | Step oublock: 0 | | Step msgsnd: 0 | | Step msgrcv: 0 | | Step nsignals: 0 | | Step nvcsw: 507 | | | +--------------------------------------------------------------------------------+
+--------------------------------------------------------------------------------+
| Step nivcsw: 8515 |
|--------------------------------------------------------------------------------|
|Node |
|---- |
| |
| Name : |
| Requirements : (Arch == "R6000") && (OpSys == "AIX51") && (Memory > 128) |
| Preferences : (Machine == { "c209f1n01.ppd.pok.ibm.com" "c209f1n05.ppd.pok.ib|
| && (Feature == "ESSL") |
| Node minimum : 2 |
| Node maximum : 2 |
| Node actual : 2 |
| Allocated Hosts : c209f1n05.ppd.pok.ibm.com:RUNNING:css0(1,LAPI,US,1M),css0(2|
| css0(3,LAPI,US,1M),css0(4,MPI,US,1M) |
| + c209f1n01.ppd.pok.ibm.com:RUNNING:css0(1,LAPI,US,1M),css0(2|
| css0(3,LAPI,US,1M),css0(4,MPI,US,1M) |
| |
| Master Task |
| ----------- |
| |
| Executable : /bin/poe |
| Exec Args : /tmp/TEST/PARA/ivp_60 -euilib us -ilevel 6 -labelio yes -pm|
| Num Task Inst: 1 |
| Task Instance: c209f1n05:-1 |
| |
| Task |
| ---- |
| |
| Num Task Inst: 4 |
| Task Instance: c209f1n05:0:css0(1,LAPI,US,1M),css0(2,MPI,US,1M) |
| Task Instance: c209f1n05:1:css0(3,LAPI,US,1M),css0(4,MPI,US,1M) |
| Task Instance: c209f1n01:2:css0(1,LAPI,US,1M),css0(2,MPI,US,1M) |
| Task Instance: c209f1n01:3:css0(3,LAPI,US,1M),css0(4,MPI,US,1M) |
| |
+--------------------------------------------------------------------------------+
|
The long listing includes these fields:
Canceled
Checkpointing
Completed
Complete Pending
Deferred
Idle
Not Queued
Not Run
Pending
Preempted (user-initiated)
Preempted (system-initiated)
Preempt Pending (user-initiated)
Preempt Pending (system-initiated)
Rejected
Reject Pending
Removed
Remove Pending
Resume Pending
Running
Starting
Submission Error
System Hold
System and User Hold
Terminated
User Hold
Vacated
Vacate Pending
| Note: | For a detailed explanation of these job states, see Job states. |
When -x and -l options are specified, llq also displays the information listed below. If several LoadL_starter processes are used for running this job step, then the values reported are either cumulative totals or the maximum values. The same is true for the processes of the job step.
Other fields displayed for parallel jobs are:
hostname:task status:adapter usage, ... ,adapter usage + ... + hostname:task status:adapter usage, ... ,adapter usage