SPEEDSHOP(1)

NAME

SpeedShop - an integrated package of performance tools

DESCRIPTION

SpeedShop is the generic name for an integrated package of performance tools to run performance experiments on executables, and to examine the results of those experiments. It also supports starting a process, in such a way as to permit a debugger to attach to it, and it supports running Purify on executables.

For Purify and for some experiments instrumentation is necessary; if so, it will be performed automatically, and the resulting instrumented executable run to generate the data.

SUPPORTED EXECUTABLES

SpeedShop works under IRIX 6.2, or later, and supports executables compiled with the IRIX 6.2 compilers (o32, n32 and 64), or with the MIPSPro 7.0 compilers (n32 and 64). SpeedShop supports C, C++, FORTRAN, ADA, and asm programs. Programs must be built using shared libraries (DSOs); nonshared or stripped executables are not supported.

RECORDING EXPERIMENTS

Experiments are recorded using the ssrun(1) command, as follows: ssrun -<exptype> <a.out-name> <a.out arguments> where <exptype> is one of the named experiments listed below.

The result of an experiment is one or more files that are named by the
following convention: <a.out-name>.<exptype>.<pid>

To start the target process running, and leave it in a state to attach a
debugger, add the -hang flag: ssrun -hang -<exptype> <a.out-name> <a.out arguments>

To get more detailed information about the run, add the -v flag:
ssrun -v -<exptype> <a.out-name> <a.out arguments> -or- ssrun -v -hang -<exptype> <a.out-name> <a.out arguments>

To run Purify on an executable, use:
ssrun -purify <a.out-name> <a.out arguments>

Purify and performance experiments are mutually exclusive.

EXPERIMENT TYPES

The following experiment types, specified by <exptype> above, are supported in the current release:

usertime
uses statistical callstack profiling, based on wall clock time, with a time sample interval of 30 milliseconds. Note: o32 executables must explicitly link with -lexc for these experiments to work; experiments on n32 and n64 executables often fail with a core dump in libexc; program execution often shows significant slowdown compared to the original executable; the stack unwind code often fails to completely unwind the stack; consequently, caller attribution can not be done beyond the point of failure.

[f]pcsamp[x]
uses statistical PC sampling, using 16-bit bins, based on user and system time, with a sample interval of 10 milliseconds. If the optional f prefix is specified, a sample interval of 1 millisecond will be used. If the optional x suffix is specified, a 32-bit bin size will be used.

ideal
uses basic-block counting, done by instrumenting the executable. Note: prof -gprof often fails to generate a report on an ideal-time experiment, because of a fatal error. In such cases, try running it without the -gprof option.

fpe
does tracing of all floating-point exceptions.

On machines with hardware performance counters, (R10000 machines), the following additional types are supported:

[f]gi_hwc
uses statistical PC sampling, based on overflows of the graduated- instruction counter, at an overflow interval of 32771. If the optional f prefix is used, the overflow interval will be 6553.

[f]cy_hwc
uses statistical PC sampling, based on overflows of the cycle counter, at an overflow interval of 16411. If the optional f prefix is used, the overflow interval will be 3779.

[f]ic_hwc
uses statistical PC sampling, based on overflows of the primary instruction-cache miss counter, at an overflow interval of 2053. If the optional f prefix is used, the overflow interval will be 419.

[f]isc_hwc
uses statistical PC sampling, based on overflows of the secondary instruction-cache miss counter, at an overflow interval of 131. If the optional f prefix is used, the overflow interval will be 29.

[f]dc_hwc
uses statistical PC sampling, based on overflows of the primary data-cache miss counter, at an overflow interval of 2053. If the optional f prefix is used, the overflow interval will be 419.

[f]dsc_hwc
uses statistical PC sampling, based on overflows of the secondary data-cache miss counter, at an overflow interval of 131. If the optional f prefix is used, the overflow interval will be 29.

[f]tlb_hwc
uses statistical PC sampling, based on overflows of the TLB miss counter, at an overflow interval of 257. If the optional f prefix is used, the overflow interval will be 53.

[f]gfp_hwc
uses statistical PC sampling, based on overflows of the graduated floating-point instruction counter, at an overflow interval of 32771. If the optional f prefix is used, the overflow interval will be 6553.

prof_hwc
uses statistical PC sampling, based on overflows of the counter specified by the environment variable _SPEEDSHOP_HWC_COUNTER_NUMBER, at an interval given by the environment variable _SPEEDSHOP_HWC_COUNTER_OVERFLOW. Note that these environment variables can not be used to override the counter number or interval for the other defined experiments.

One additional experiment type may be recorded, but no report generation for it is supported. It is:

heap does tracing of all malloc and free, etc. calls, and also supports
various options for debugging heap usage.

Custom experiments will be supported in future releases.

REPORT GENERATION

Report generation is done through the prof(1) command: prof <output file> . . . <output file> It will add the data from all of the output files, and produce a listing which depends on the particular experiment type.

For [f]pcsamp[x], and the various *_hwc experiments, a function list annotated with the appropriate metric is produced.

For ideal experiments, the same sort of list is produced, and, if the -gprof flag is added, a list of callers and callees of each function is produced.

For usertime and fpe experiments, a gprof-like list of callers and callees of each function is produced.

There are many additional options to prof; see the prof(1) man page for further details

CALIPER SAMPLES

In the current releases, caliper samples may be recorded, and the -calipers option to prof, will allow you to see the data for any caliper-setting.

Caliper samples are supported in three different ways. First, the user can explicitly link with the SpeedShop runtime, and call its API routine to record a caliper sample; second, the user can define a signal to be used to record a caliper sample, by specifying the environment variable _SPEEDSHOP_CALIPER_POINT_SIG and send the target the specified signal; third, a caliper-sample trap may be set in either dbx, or the WorkShop debugger. In the current debuggers, this is done by planting an stop trap (breakpoint), and, when the process stops, evaluating the expression: ssrt_caliper_point(1) the evaluation of the expression always returns zero, but a side effect of the evaluation is the recording of the appropriate data. After evaluation, process execution may be resumed. See the ssapi(3) man page for further details.

USER ENVIRONMENT VARIABLE CONTROLS

Various environment variables are normally used to control the operation of SpeedShop. They are:

_SPEEDSHOP_VERBOSE
causes a log of each program's operation to be written to stderr. If it is set to an empty string, only major events are logged; if it is set to a non-empty string, more detailed events are logged.

_SPEEDSHOP_SILENT
suppresses all output, other than fatal error messages from SpeedShop. If both _SPEEDSHOP_VERBOSE and _SPEEDSHOP_SILENT are set, _SPEEDSHOP_SILENT wins.

_SPEEDSHOP_CALIPER_POINT_SIG <signal-number>
if specified, gives a signal number to be used for recording a caliper-point in the experiment.

_SPEEDSHOP_COMMAND_FILE
if set, specifies the name of a file which may contain additional environment controls, and definitions of additional experiment types. Note: this option is not supported in the current release.

_SPEEDSHOP_OUTPUT_DIRECTORY
if specified, the output data files will be put in the named directory.

_SPEEDSHOP_OUTPUT_FD
if specified, gives the number of the file descriptor to be used for writing the output file. Note: this option is not supported in the current release

_SPEEDSHOP_REUSE_FILE_DESCRIPTORS
if specified, opens and closes the file descriptors for the output files every time performance data is to be written.

_SPEEDSHOP_OUTPUT_FILENAME
if specified, the given name will be used for the output file; if _SPEEDSHOP_OUTPUT_DIRECTORY is also specified, it will be prepended to the name.

_SPEEDSHOP_OUTPUT_PIPE
if specified, gives a command and arguments into which the output will be piped. Note: this option is not supported in the current release.

_SPEEDSHOP_HWC_COUNTER_NUMBER
specifies the counter to be used for prof_hwc experiments. Counters are numbered between 0 and 31, and are described in the MIPS R10000 Microprocessor User's Manual, Chapter 14. Counter 0 counters are numbered 0-15, and counter 1 counters are numbers 16-31.

_SPEEDSHOP_HWC_COUNTER_OVERFLOW
specifies the overflow value for the counter to be used in prof_hwc experiments. The value chosen may be any number greater than 0. Some choices may produce data that is not statistically random, but rather reflects a correlation between the overflow interval and a cyclic behavior in the application. Users may want to do two or more runs with different overflow values.

_SPEEDSHOP_OUTPUT_NOCOMPRESS
if specified, disables the compression of performance data.

Other variables will be documented in the future releases.

PROCESS TRACKING ENVIRONMENT VARIABLE CONTROLS

Various environment variables may be used for controlling the treatment of processes spawned from the original target. They are:

_SPEEDSHOP_TRACE_FORK {True|False}
if True, specifies that processes spawned by calls to fork() will be monitored, if they do not call exec(). If they do call exec(), and _SPEEDSHOP_TRACE_FORK_TO_EXEC is not set to True, the data covering the time between the fork() and the exec() will be discarded. It is True by default. Note: in the current release, data will be recorded independent of whether the process calls exec() or not.

_SPEEDSHOP_TRACE_FORK_TO_EXEC {True|False}
if True, specifies that process spawned by calls to fork() will be monitored, even if they also call exec(). It is False by default.

_SPEEDSHOP_TRACE_EXEC {True|False}
if True, specifies that process spawned by calls to any of the various flavors of exec() will be monitored. It is True by default

_SPEEDSHOP_TRACE_SPROC {True|False}
if True, specifies that process spawned by calls to sproc() will be monitored. It is True by default.

_SPEEDSHOP_TRACE_SYSTEM {True|False}
if True, specifies that process spawned by calls to system() will be monitored. It is False by default. Note: this option is not supported in the current release.

EXPERT-MODE ENVIRONMENT VARIABLE CONTROLS

Various additional environment variables may be used for debugging and finer control of the operation of SpeedShop. They are:

_SPEEDSHOP_SAMPLING_MODE
for PC-sampling and hardware-counter profiling, if set to 1, will generate data for the base executable only. If it is not set, or set to anything other than 1, data is generated for the executable and all DSOs it uses.

_SPEEDSHOP_INIT_DEFERRED_SIGNAL <signal-number>
If specified, initialization of the experiment will not be performed when the target process starts, but rather will be delayed until the specified signal is sent to the process. A handler for the given signal will be installed when the process starts, and it is the users responsibility to ensure that it is not overridden by the target code.

_SPEEDSHOP_EXPERIMENT_TYPE
passes the name of the experiment to the runtime. It is normally set by ssrun(1), but may be overwritten.

_SPEEDSHOP_MARCHING_ORDERS
passes the marching orders of the experiment to the runtime. It is normally set by ssrun(1) from the experiment type, but may be overwritten.

_SPEEDSHOP_SBRK_BUFFER_LENGTH
defines the maximum size of the internal malloc arena used. This arena is completely separate from the user's arena, and has a default size of 0x400000.

_SPEEDSHOP_FILE_BUFFER_LENGTH
defines the size of the buffer used for writing the experiment files. The default length is 8KB. The buffer is only used for writing small records to the file; large records are written directly, to avoid the buffering overhead.

_SPEEDSHOP_DEBUG_NO_SIG_TRAPS
disables the normal setting of signal handlers for all fatal and exit signals

_SPEEDSHOP_DEBUG_NO_STACK_UNWIND
suppresses the stack unwind as done in usertime experiments, and as is done at caliper-samples for all experiments. The option is used as a workaround for various unwind bugs in libexc.

Other variables will be documented in the future releases.

INSTRUMENTATION

Instrumentation is done with the pixie(1) command, invoked automatically by ssrun(1), and, if necessary for DSOs that are opened during a run, by the runtime library. Users normally would not invoke pixie(1) directly.

In the current release, instrumented executables and DSOs appear in the current working directory. In a future release, the DSOs will be cached.

SPEEDSHOP API ROUTINES

The SpeedShop API routines are defined in the include file "SpeedShop/api.h", installed in /usr/include. It defines three entry points, described int the SpeedShop API man page, ssapi(3).

SPEEDSHOP CUSTOM DATA CAPTURE ROUTINES

The SpeedShop facility for users to add custom data capture routines is not available in the current release.

MISCELLANEOUS UTILITY PROGRAMS

Three utility routines are provided, in addition to the main functionality in SpeedShop. They are:

ssusage
is a variant of time(1) that prints more information about the resource usage of a program. See ssusage(1) for more information.

squeeze
is a program which allocates and locks down memory, making the system behave as if it had less physical memory that it really does. See squeeze(1) for more information.

thrash
is a program that allocates memory, and then touches all of the pages, in order to force other pages out of the system's physical memory. See thrash(1) for more information.

fbdump
is a program that dumps out the contents of the compiler feedback files produced by the -feedback option to prof(1). See fpdump(1) and prof(1) for more information.

SEE ALSO

ssrun(1), ssdump(1), prof(1), pixie(1), fbdump(1), ssusage(1), squeeze(1), thrash(1), malloc_ss(3), fpe_ss(3), ssapi(3)