- Timing Codes
- Profiling
1. Timing Codes
In case you are wondering about the definition of the
types of time,
here they are:
-
user -- the amount of CPU time used by the user's program
-
sys (or system) -- the amount of CPU time used by the system
in support of the user's program
-
cpu -- the total CPU time, i.e., user + sys
-
wall -- the wall clock time, i.e., elapsed real time
Typically the cpu time and the wall clock time are the same, unless there are other user
processes running or there is significant system usage as in excessive disk usage from i/o
operations or swapping/paging.
1.1 time (/usr/bin/time)
The quickest way to get timing of a code is run the code within the
command: /usr/bin/time. The command will return
user time, system time
and the total wall time. See the man page on
time
to see more information on the command, especially on formatting the output.
Note that the csh and tcsh shells have a built-in command also called
time.
% /usr/bin/time a.out
Use the -p option to use portability format.
1.2 gprof
gprof is currently not functioning correctly with MPI (MPT) codes.
A quick way to get more detailed information on functions and routines
is to use the profile tool
gprof.
The first step is to
compile to source code with the compiler flags for profiling. For the Intel compiler
the flags are -p -g and for the GNU compiler the flag is -pg.
For the Intel compiler the -g flag does not change the optimization
indicated by the presence of a -O flag. After compiling
the code, the second step is to execute the code which will then generate
a gmon.out file. To analyze the gmon.out file, use
gprof. The results of the analyses will be dumped
to stdout. The flat profile will contain a useful breakdown of time
spent in functions and subroutines. The call graph profile contains
inclusive and exclusive time spent in subroutines and functions. See
the man pages on the Intel and GNU compilers for information about the
compiler flags for profiling and see the man page on
gprof
for its options.
% ifort -O -p -g myprog.f # or gcc -O -pg myprog.c
% ./a.out
% gprof --flat-profile a.out gmon.out
See the section on Profiling below for more information about
using gprof.
For even easier timing and profiling without re-compiling, consider using
psrun from PerfSuite.
2. Profiling
2.1 gprof
gprof is currently not functioning correctly with MPI (MPT) codes.
A quick way to get more detailed information on
functions and routines is to use the profile tool
gprof.
The first step is to compile to source
code with the compiler flags for profiling. For the Intel compiler
the flags are -p -g and for the GNU compiler the flag is
-pg.
For the Intel compiler the '-g' flag does not change the optimization
indicated by the presence (if any) of the '-O' flag. After compiling
the code, the second step is to execute the code which will then generate
a gmon.out file. To analyze the gmon.out file, use
gprof. The results of the analyses will be dumped
to stdout.
% ifort -O -p -g myprog.f # or gcc -O -pg myprog.c
% ./a.out
% gprof a.out gmon.out
The 'flat' profile will contain a useful breakdown of time
spent in functions and subroutines. The 'call graph' profile contains
inclusive and exclusive time spent in subroutines and functions. See
the man pages on the Intel and GNU compilers for information about the
compiler flags for profiling and see the man page on
gprof
for its options.
An undocumented GMON environment variable is GMON_OUT_PREFIX. When profiling
a threaded or MPI code, each process will generate a gmon file called $GMON_OUT_PREFIX.pid.
Each gmon file can then be analyzed seperately or the aggregate sum can be produced by gmon
and examined as a whole:
% gprof -s $GMON_OUT_PREFIX.*
% gprof a.out gmon.sum
2.2 PerfSuite
The PerfSuite performance suite provides a profiling tool called
psrun which
provides enhanced functionality of the timing and
profiling tools mentioned above. See the Linux Journal article
Measuring and Improving Application Performance with PerfSuite for an
introduction to PerfSuite.
The simpliest way to use psrun is with an existing executable:
Serial codes
% soft add +perfsuite
% resoft
% psrun ./a.out
% psprocess a.out.PID.xml # PID is the process ID when a.out was run
OpenMP and MPI codes
See the discussion at the Perfsuite page
here.
See the documentation for psprocess
for information on analyzing the XML files generated by psrun.
Performance Engineering and Computational Methods Group (PECM)
High-End Computing Division