NCSA Home
Contact Us | Intranet | Search

VProf (The Visual Profiler) at NCSA

The VProf visual profiler from Sandia National Laboratories is available on the NCSA Xeon cluster (Tungsten), TeraGrid Itanium 2 cluster (Mercury), and SGI Altix (Cobalt). VProf was developed by Curtis Janssen. The VProf Home Page provides more information and a brief user guide.

In many cases, you'll need to make no source code changes to use VProf. You'll only need to recompile and relink your application. The recompilation step is necessary in order to include symbol information that VProf uses. Refer to the VProf User Guide for details.

Two screenshots of VProf in action are available here.


[an error occurred while processing this directive]

The VProf package consists of a library and two tools: the graphical tool vprof and the text-based tool cprof. VProf allows you to profile your application in several different ways:

  1. Statistical sampling of the program counter (as with traditional profilers), using the profil(3) subroutine. This is the default method of profiling.
  2. Using hardware performance counter statistics gathered through the PAPI library from the Innovative Computing Laboratory at the University of Tennessee-Knoxville.
  3. A direct interface to the x86 Linux "perfctr" performance counter kernel patch developed by Mikael Pettersson. (this option is not available on Tungsten as the perfctr support within VProf 0.12 does not include Pentium 4 performance counters)

[an error occurred while processing this directive]

/usr/apps/tools/vprof/bin
/usr/projects/perftools/vprof/bin (on Mercury)

The command-line utilities vprof and cprof.

/usr/apps/tools/vprof/lib
/usr/projects/perftools/vprof/lib (on Mercury)

Libraries and object file that you can link with your application.

[an error occurred while processing this directive]

The performance counters on a number of different platforms, including Xeon, support hardware interrupt on overflow. You can think of this in the same way that you might with traditional time-based profiling. With time-based profiling, an "alarm" is set periodically. When that alarm expires after some amount of time (the threshold), the program counter is sampled and the position in the program is noted. In the same way, hardware counters that support interrupt on overflow can be programmed to notify software when a given number of a particular event (for example, level 2 cache misses) have occurred. This is a very flexible generalization of traditional statistical profiling techniques.

Unfortunately, setting the proper "threshold" at which the alarm will go off is not uniform across events (for example, level 1 cache misses probably occur far more frequently than translation lookaside buffer misses). You will need to adjust the threshold yourself according to the particular hardware event that you are monitoring.

At NCSA, VProf has been modified to accept the environment variable VMON_FREQ. You can set this variable to any integer value, which will be used during the run of your program as the interrupt threshold. You'll probably want to experiment with this variable to find a value that gives you the best results: too high of a value will result in an inexact profile, while too low of a value will likely slow down the execution of your application significantly due to excessive calls to the interrupt handler (it's probably best to start high and decrease the value until you are getting reasonable results). By default, VMON_FREQ is set to 100000.


[an error occurred while processing this directive]

Here are basic instructions for preparing your program for profiling and then using the VProf graphical or text-based tools. You can find more detailed information in the VProf User Guide.

Important note

If possible, you should link your application statically (with the option -static). As mentioned in the VProf documentation, if routines in shared libraries are sampled, they will be outside of the range of VProf's profiling buffer and no information about the event will be recorded. If, when linking statically, you encounter link-time errors referring to missing symbols with "pthread" in their name, then you should try adding the flag -lpthread to the end of your link line. If you cannot link your application statically, you can still run your dynamically-linked program and obtain a profile but you should be aware of the possibility of missing samples.

Tip

Because VProf can make use of the PAPI library, you may want to review the NCSA PAPI page for more complete instructions on that software and the steps to follow when building a program that uses PAPI. This page also contains a listing of the PAPI events that are available on the Xeon platform (these are the values that you might supply to VProf using the VMON environment variable).

Generating VProf profiles with PerfSuite

You can also use the tools psrun and psprocess, which are part of PerfSuite, to generate VProf-format profiles from your application. This provides a way to use cprof and/or vprof without the need for relinking. Note that psrun requires dynamic linking and therefore will only generate VProf profiles that contain information from your main program (you can use psprocess independently from VProf to view shared library profile data).

Compiling and linking a single-processor program

To compile and link the single-processor program "myprog.f" for VProf-profiling on the Tungsten cluster:

% ifc -c -g myprog.f 
% ifc -static -o myprog myprog.o \
	/usr/apps/tools/vprof/lib/vmonauto_gcc.o \
	-L/usr/apps/tools/vprof/lib -L/usr/apps/tools/papi3/lib \
        -lvmon -lpapi

After your program completes successfully, you should have a single file named "vmon.out" that contains the result of profiling.

Compiling and linking an MPI program

To compile and link the MPI program "mpiprog.f" for VProf-profiling on the Tungsten cluster:

ChaMPIon/Pro
% cmpifc -c -g mpiprog.f
% cmpifc -static -o mpiprog mpiprog.o \
        /usr/apps/tools/vprof/lib/vmonauto_pmpi.o \
	-L/usr/apps/tools/vprof/lib -L/usr/apps/tools/papi3/lib \
	-lvmon -lpapi
MPICH-GM
% mpif77 -c -g mpiprog.f
% mpif77 -static -o mpiprog mpiprog.o \
         /usr/apps/tools/vprof/lib/vmonauto_pmpi.o \
	-L/usr/apps/tools/vprof/lib -L/usr/apps/tools/papi3/lib \
	-lvmon -lpapi -lpthread

After your program completes successfully, you should have one or more output files in your working directory. Each will be named "vmon.out.ID", where "ID" is an integer that corresponds to the MPI task ID assigned during the run.

To enable automatic profiling of your program (as shown in these examples), you should link in an additional object file, depending on the programming model in use. You can choose from:

vmonauto_gcc.o
For use with serial applications, this file causes profiling to start when your application begins and terminates profiling just before your program exits.
vmonauto_pmpi.o
For use with MPI applications, this file will cause profiling to begin when MPI_Init() is called and will terminate profiling when MPI_Finalize() is called.

Running Your Program and Obtaining a VProf Profile

Before you run a VProf-linked program, you should select the type of profiling you'd like by setting the environment variable VMON appropriately (refer to the VProf User Guide for details). For example, to obtain a profile based on the number of total floating point operations during the run of your program as measured by PAPI, you would enter:

% setenv VMON PAPI_FP_OPS

Note:
MPICH-GM only passes the environment variables DISPLAY and LD_LIBRARY_PATH to the remote tasks by default, so to accomplish the setting of the VMON environment variable with MPICH-GM, you may want to use the following form for your MPI job launch command:

mpirun.ch_gm VMON=PAPI_FP_OPS -np X mpiprog

If you've linked your application dynamically, you'll also want to set up your environment properly so that the PAPI shared library can be located at runtime. On Tungsten, you can use the SoftEnv package for this:

% soft add +papi3

Then run your program as you normally would. If all goes well, you will have one or more VProf profiles in your working directory as described above.

Viewing The Results

To view the profiles, all you need to do is to invoke cprof or vprof, supplying the name of your executable program followed by the names of the "vmon.out" files that you'd like to view (you can also do this from inside vprof). For example,

% cprof -e myprog vmon.out

In this example, we provide the option -e, which asks cprof to display "everything" in the profiles. Without this option, you'll receive a brief summary of the information contained in the profiles. cprof supports the option -h which will display a summary of this and other options that you can use to tailor the output according to your needs.

Note that you can supply multiple VProf profiles on the command line (for example, when working with output from multiple MPI tasks) and VProf's tools will present the results in an aggregate form. Here's an example:

% cprof -e myprog vmon.out.0 vmon.out.1 vmon.out.2 vmon.out.3


[an error occurred while processing this directive]

Please refer to the VProf User Guide for additional information about VProf.