PERFEX(1)
NAME
- perfex - a command line interface to R10000 counters
SYNOPSIS
- perfex [-a | [-e event0] [[-e event1]]]
- [-mp | -s] [-x] command
DESCRIPTION
- The given command is executed; after it is complete, perfex prints the
values of various hardware performance counters.
The integers event0 and event1 index this table:
- 0 = Cycles
1 = Issued instructions
2 = Issued loads
3 = Issued stores
4 = Issued store conditionals
5 = Failed store conditionals
6 = Decoded branches
7 = Quadwords written back from scache
8 = Correctable scache data array ECC errors
9 = Primary instruction cache misses
10 = Secondary instruction cache misses
11 = Instruction misprediction from scache way prediction table
12 = External interventions
13 = External invalidations
14 = Virtual coherency conditions
15 = Graduated instructions
16 = Cycles
17 = Graduated instructions
18 = Graduated loads
19 = Graduated stores
20 = Graduated store conditionals
21 = Graduated floating point instructions
22 = Quadwords written back from primary data cache
23 = TLB misses
24 = Mispredicted branches
25 = Primary data cache misses
26 = Secondary data cache misses
27 = Data misprediction from scache way prediction table
28 = External intervention hits in scache
29 = External invalidation hits in scache
30 = Store/prefetch exclusive to clean block in scache
31 = Store/prefetch exclusive to shared block in scache
OPTIONS
- -e event
- Specify an event to be counted
2, 1, or 0 event specifiers may be given, the default events being
to count cycles. Events may also be specified by setting one or
both of the environment variables T5_EVENT0 and T5_EVENT1. Command
line event specifiers if present will override these. The order of
events specified is not important. The counts, together with an
event description are written to stderr. Two events which *must* be
counted on the same hardware counter (see r10k_counters(5)) will
cause a conflicting counters error.
- -a
- Multiplex over all events, projecting totals. Ignore event
specifiers.
The option -a produces counts for all events by multiplexing over 16
events per counter. The OS does the switching round robin at clock
interrupt boundaries. The resulting counts are normalized by
multiplying by 16 to give an estimate of the values they would have
had for exclusive counting. Due to the equal-time nature of the
multiplexing, you are guaranteed that any events present in large
enough numbers to contribute significantly to the execution time
will be fairly represented. Events concentrated in a few short
regions (say, icache misses) may not be projected very accurately.
- -mp
- Report per-thread counts for mp programs as well as (default)
totals.
By default perfex aggregates the counts of all the child threads and
reports this number for each selected event. The -mp option causes
the counters for each thread to be collected at thread exit time and
printed out, followed by the counts aggregated across all threads.
The counts are labeled by pid.
- -s
- Start(stop) counting on SIGUSR1(SIGUSR2) signal receipt by perfex
process.
This option causes perfex to wait until it (i.e. the perfex process)
receives a SIGUSR1, before it starts counting (for the child
process). It will stop counting if it receives a SIGUSR2. Repeated
cycles of this will aggregate counts. If no SIGUSR2 is received, the
counting will continue until the child exits (a normal case).
- -x
- Count at exception level (as well as the default user level).
Exception level includes time spent on behalf of the user during,
e.g., TLB refill exceptions. Other counting modes (kernel,
supervisor) are available through the OS ioctl interface ( see
r10k_counters(5) ).
EXAMPLE
- To collect instruction and data scache miss counts on a program normally
- executed by
% bar < bar.in > bar.out
would be accomplished by
% perfex -e 26 -e 10 bar < bar.in > bar.out .
DEPENDENCIES
- perfex only works on an R10000 system. For the -mp option only, only
mips4 (-n32 or -64) binaries linked -shared are currently supported.
This is due to a dependency on libperfex.so. The options -s and -mp are
currently mutually exclusive.
SEE ALSO
- r10k_counters(5), libperfex(3), time(1), timex(1), ecadmin(1), ecstats(1)