PERFEX(1)

NAME

perfex - a command line interface to R10000 counters

SYNOPSIS

perfex [-a | [-e event0] [[-e event1]]]
[-mp | -s] [-x] command

DESCRIPTION

The given command is executed; after it is complete, perfex prints the values of various hardware performance counters.

The integers event0 and event1 index this table:
0 = Cycles 1 = Issued instructions 2 = Issued loads 3 = Issued stores 4 = Issued store conditionals 5 = Failed store conditionals 6 = Decoded branches 7 = Quadwords written back from scache 8 = Correctable scache data array ECC errors 9 = Primary instruction cache misses 10 = Secondary instruction cache misses 11 = Instruction misprediction from scache way prediction table 12 = External interventions 13 = External invalidations 14 = Virtual coherency conditions 15 = Graduated instructions 16 = Cycles 17 = Graduated instructions 18 = Graduated loads 19 = Graduated stores 20 = Graduated store conditionals 21 = Graduated floating point instructions 22 = Quadwords written back from primary data cache 23 = TLB misses 24 = Mispredicted branches 25 = Primary data cache misses 26 = Secondary data cache misses 27 = Data misprediction from scache way prediction table 28 = External intervention hits in scache 29 = External invalidation hits in scache 30 = Store/prefetch exclusive to clean block in scache 31 = Store/prefetch exclusive to shared block in scache

OPTIONS

-e event
Specify an event to be counted
          2, 1, or 0 event specifiers may be given, the default events being
          to count cycles.  Events may also be specified by setting one or
          both of the environment variables T5_EVENT0 and T5_EVENT1. Command
          line event specifiers if present will override these. The order of
          events specified is not important.  The counts, together with an
          event description are written to stderr.  Two events which *must* be
          counted on the same hardware counter (see r10k_counters(5)) will
          cause a conflicting counters error.
-a
Multiplex over all events, projecting totals. Ignore event specifiers.
          The option -a produces counts for all events by multiplexing over 16
          events per counter. The OS does the switching round robin at clock
          interrupt boundaries. The resulting counts are normalized by
          multiplying by 16 to give an estimate of the values they would have
          had for exclusive counting. Due to the equal-time nature of the
          multiplexing, you are guaranteed that any events present in large
          enough numbers to contribute significantly to the execution time
          will be fairly represented. Events concentrated in a few short
          regions (say, icache misses) may not be projected very accurately.
-mp
Report per-thread counts for mp programs as well as (default) totals.
          By default perfex aggregates the counts of all the child threads and
          reports this number for each selected event. The -mp option causes
          the counters for each thread to be collected at thread exit time and
          printed out, followed by the counts aggregated across all threads.
          The counts are labeled by pid.
-s
Start(stop) counting on SIGUSR1(SIGUSR2) signal receipt by perfex process.
          This option causes perfex to wait until it (i.e. the perfex process)
          receives a SIGUSR1, before it starts counting (for the child
          process). It will stop counting if it receives a SIGUSR2. Repeated
          cycles of this will aggregate counts. If no SIGUSR2 is received, the
          counting will continue until the child exits (a normal case).
-x
Count at exception level (as well as the default user level).
          Exception level includes time spent on behalf of the user during,
          e.g., TLB refill exceptions.  Other counting modes (kernel,
          supervisor) are available through the OS ioctl interface ( see
          r10k_counters(5) ).

EXAMPLE

To collect instruction and data scache miss counts on a program normally
executed by % bar < bar.in > bar.out would be accomplished by % perfex -e 26 -e 10 bar < bar.in > bar.out .

DEPENDENCIES

perfex only works on an R10000 system. For the -mp option only, only mips4 (-n32 or -64) binaries linked -shared are currently supported. This is due to a dependency on libperfex.so. The options -s and -mp are currently mutually exclusive.

SEE ALSO

r10k_counters(5), libperfex(3), time(1), timex(1), ecadmin(1), ecstats(1)