IBM Books

IBM LoadLeveler for AIX 5L: Using and Administering

llckpt - Checkpoint a running job step

Purpose

Checkpoints a single job step.

Note:Before you consider using the Checkpoint/Restart function refer to the LoadL.README file in /usr/lpp/LoadL/READMES for information on availability and support of this function.

Syntax

llckpt { -? | -H | -v | [-k | -u] [-r] [-q] <jobstep>}

Flags

-?
Provides a short usage message.

-H
Provides extended help information.

-v
Outputs the name of the command, release number, service level, service level date, and operating system used to build the command.

-k
Specifies that the job step is to be terminated after a successful checkpoint. The default is for the job to continue. Note that you cannot use the -k and -u flags together.

-u
Specifies that the job step is to be put on user hold after a successful checkpoint. The default is for the job to continue. Note that you cannot use the -k and -u flags together.

-r
When this flag is issued, it specifies that the command is to return without waiting for the checkpoint to complete. When using this flag you should be aware that information relating to the success or failure of the checkpoint will not be available to the command. The default is for the checkpoint to complete before returning.

-q
Specifies quiet mode, will not print any messages other than error messages.

jobstep
Specifies the name of a job step to be checkpointed using the form host.jobid.stepid where:

Description

The llckpt command should be used to save the state of the job in the event it does not complete. When a job is checkpointed it can later be restarted from the checkpoint file rather than the beginning of the job. To restart a job from a checkpoint file, the original job command file should be used with the value of the restart_from_ckpt keyword set to yes. The name and location of the checkpoint file should be specified by the ckpt_dir and ckpt_file keywords.

Examples

This example checkpoints the job step 1 that is part of job 12 which was scheduled by the machine named iron. Upon successful completion of checkpoint, the job step will return to the RUNNING state.

llckpt iron.12.1

This example checkpoints the job step 3 that is part of job 14 which was scheduled by the machine named bronze. Upon successful completion of checkpoint the job step will be put on user hold:

llckpt -u bronze.14.3

Results

When the -r option is not used, the llckpt command will wait for the checkpoint to complete. Immediately upon executing the command llckpt iron.12.1 the following message is displayed:

llckpt: The llckpt command will wait for the results of the checkpoint on 
job step iron.12.1 before returning

Once the checkpoint has successfully completed, the following message is displayed:

llckpt: Checkpoint of job step iron.12.1 completed successfully

If there was a problem taking the checkpoint, the second message would have this form:

llckpt: Checkpoint FAILED for job step iron.12.1 with the following error:
primary error code = <numeric error number>, 
secondary error code = <secondary numeric error/extended numeric error>, 
error msg len = <length of message>, error msg = <text describing the error>

Where: primary error code is defined by /usr/include/sys/errno.h and secondary error code is defined by /usr/include/sys/chkerror.h.

The -r option is used to return without waiting for the result of a checkpoint. The following output is displayed for the command llckpt -r bronze.14.3:

llckpt: The llckpt command will not wait for the checkpoint of 
job step bronze.14.3 to complete before returning.

Due to delays in communication between LoadLeveler daemons, status information may not be returned at the same time that checkpoint termination is received. This indicates that the checkpoint has completed but the success or failure status is not known. When this happens, the following message is displayed:

llckpt: Checkpoint of job step iron.12.1 completed. No status information is available.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]