NCSA Home
Contact Us Intranet

PAPI

User Information Home
Data
Security
Allocations
Consulting
Training

NCSA's Help Desk is available 24 hours a day, seven days a week, 365 days a year:
help.ncsa.illinois.edu
217-244-0710
help@ncsa.illinois.edu

Overview

The PAPI (Performance Application Programming Interface) library from the Innovative Computing Laboratory at the University of Tennessee-Knoxville is available on the Dell NVIDIA cluster (Forge) and SGI Altix UV cluster (Ember) at NCSA. PAPI is an effort to establish a uniform, standard programming interface for accessing hardware performance counters on modern microprocessors. The PAPI web site is located at:

http://icl.cs.utk.edu/papi/

Hardware performance counters can be very useful for tuning the performance of applications and for evaluating the effectiveness of the compiler on your application. These counters allow you to directly measure the actual usage of the hardware as your application runs and may help you to diagnose bottlenecks in your application's performance. By using PAPI, you gain the benefit of a cross-platform interface to the counters, allowing you to maintain a common source for a wide variety of architectures.

The page you are currently viewing is an overview intended to provide information of interest and/or specific to users of PAPI at NCSA. You can find a number of documents that cover PAPI in more detail at the PAPI web site including The PAPI Reference, tailored for the end-user or person new to PAPI. The repository also includes a link to an in-depth tutorial by members of the PAPI development team entitled Performance Tuning Using Hardware Counter Data.

PAPI provides both a simple, high-level interface that may be suitable for your needs and also a low-level interface that gives you much more control over PAPI, including access to native hardware events that are not part of the PAPI standard event definitions. Neither the low-level interface nor accessing native events through PAPI are covered here; please refer to the PAPI web site and processor-specific documentation for details.

This page provides information on the following topics:


System Kernel PMU Support PAPI version Directory
Forge perf_events 4.4.0 /usr/apps/tools/papi/4.4.0-forge
Ember perf_events 4.4.0 /usr/apps/tools/papi/4.4.0-login (for use on the login node)
/usr/apps/tools/papi/4.4.0-cmp (for use on the compute node)


The PAPI directory contains the compiled libraries, include files, UNIX manual pages, and example programs from the PAPI distribution. You'll need to ensure that this directory is named as part of the search path for both include files as well as libraries during the compile and link process (see below).


PAPI include files for Fortran

There are three different Fortran include files that you can choose from when compiling your PAPI-enabled Fortran program:
fpapi.h
This is an include file that requires C-style preprocessing. Several compilers will treat a Fortran source code file with the suffix ".F" (uppercase F) as a file that should be passed through the C preprocessor. Consult the documentation for the compiler you are using for specifics.
f77papi.h
This is a Fortran 77-style include file. This file requires no C preprocessing, so you may find it more convenient to use.
f90papi.h
This is a Fortran 90-style include file. Like f77papi.h, this file requires no C preprocessing, so you may find it more convenient to use.

PAPI libraries

If you link with the shared (.so) version of the library, you will have to specify where the PAPI shared library can be found at runtime. For example, on the Linux clusters you can:
  • Include the directory /usr/apps/tools/papi/<version>/lib in your LD_LIBRARY_PATH environment variable, such as /usr/apps/tools/papi/4.4.0-forge/lib.
  • Specify the option:
    -Wl,-rpath,/usr/apps/tools/papi/<version>/lib
    
    when you link your executable. The -rpath option allows you to add directories to the runtime linker's search path.

If you link the static version of the PAPI library into your program, your executable should run without having to modify the LD_LIBRARY_PATH environment variable. You can cause the static version to be used by specifying -static at link time, or by including /usr/apps/tools/papi/<version>/lib/libpapi.a on your link command.

For all compilers, specify

	-I/usr/apps/tools/papi/<version>/include
at compile time, and specify
	-L/usr/apps/tools/papi/<version>/lib -lpapi
at the link step.


Using PAPI_flops

Perhaps the easiest way to use the PAPI high-level functions (which may be sufficient for many users) is to call the routine PAPI_flops (or in Fortran, PAPIF_flops). This routine, which may be called multiple times from a single-threaded program, is an easy way to measure wall-clock time, CPU time, the number of floating point instructions executed, and the MFLOP rate.

Here's an example of using PAPI_flops from Fortran:

      include 'f77papi.h'
      real real_time, cpu_time, mflops
      integer*8 fp_ins
      integer ierr

C Call PAPIF_flops to get things started.  This will initialize PAPI
C and start the counters running.  Each of these calls return an
C error code in the 'ierr' parameter.  See below for details on
C how to manage this.

      call PAPIF_flops(real_time, cpu_time, fp_ins, mflops, ierr)

C Do some computation

      call compute() 

C Read the values in the counters and print them out.  Any call to 
C PAPIF_flops with fp_ins set to the value -1 will reinitialize
C all counters to zero.  You might want to do this in order 
C to individually time different portions of your application.

      call PAPIF_flops(real_time, cpu_time, fp_ins, mflops, ierr)

      write (*,100) real_time, cpu_time, fp_ins, mflops

100   format('           Real time (secs) :', f15.3, 
     +      /'            CPU time (secs) :', f15.3,
     +      /'Floating point instructions :', i15,
     +      /'                     MFLOPS :', f15.3)

Using the general PAPI high-level interface

Here's an example in Fortran of using the general high-level PAPI API, which allows you to count any available PAPI events of your choice:
  1. Include the proper PAPI constant definitions:
    	include 'f77papi.h'
    
  2. Declare the events you want to count and other error-related variables, for example:
           integer events (2), numevents, ierr
           character*(PAPI_MAX_STR_LEN) errorstring
    
  3. Declare variables to hold the event counts:
           integer*8 values (2)
    
  4. Set each event to the desired type, listed in f77papi.h (or below):
           numevents = 2
           events(1) = PAPI_FP_INS
           events(2) = PAPI_TOT_CYC
    
  5. Start and clear the counters:
           call PAPIF_start_counters(events, numevents, ierr)
    
  6. Do some computation, then read and reset them but leave them running:
           call PAPIF_read_counters(values, numevents, ierr)
    
    A similar routine, PAPIF_accum_counters, accepts the same arguments but adds the current values to the running totals already contained in the values array.
  7. Compute some more and then stop the counters and retrieve the values:
           call PAPIF_stop_counters(values, numevents, ierr)
    
  8. Each of those calls returns an error code that you can handle this way:
           if ( ierr .ne. PAPI_OK ) then
    	 call PAPIF_perror(ierr, errorstring, PAPI_MAX_STR_LEN)
    	 print *, errorstring
           endif
    
A similar C sequence is:
	#include <papi.h>

	#define NUMEVENTS 2

	unsigned int events[NUMEVENTS] = {PAPI_FP_INS, PAPI_TOT_CYC};
	int errorcode;
	long long values[NUMEVENTS];
	char errorstring[PAPI_MAX_STR_LEN+1];

	errorcode = PAPI_start_counters(events, NUMEVENTS);

	/* Compute... */

	errorcode = PAPI_read_counters(values, NUMEVENTS);

	/* Compute some more... */

	errorcode = PAPI_stop_counters(values, NUMEVENTS);

	if (errorcode != PAPI_OK) {
	    PAPI_perror(errorcode, errorstring, PAPI_MAX_STR_LEN);
	    fprintf(stderr, "PAPI error (%d): %s\n", errorcode, errorstring);
	}


You can count five (Intel Core 2) or four (AMD Magny-Cours) individual events, or you can alternatively "multiplex" the available physical counters over a larger number of events. Please refer to the PAPI web site for instructions on multiplexing.

Certain native hardware events are restricted to a subset of the available counters. The details of this are beyond the scope of this web page; refer to the Intel and AMD manuals for more information. In general though, you don't have to concern yourself with this when accessing counters through the PAPI software; the details are taken care of for you.


Much more detailed information about the hardware performance counters on Intel Core 2 and AMD Magny-Cours processors, including a complete listing of all native events available on these processors, can be found at the vendors' web sites:

Intel® 64 and IA-32 Architectures Software Developer Manuals

IntelĀ® 64 and IA-32 Architectures Software Developer's Manual Combined Volumes 3A and 3B: System Programming Guide, Parts 1 and 2 (Intel IA-32 and x86-64)

AMD Developer Guides & Manuals

BIOS and Kernel Developer's Guide (BKDG) For AMD Family 10h Processors (AMD Magny-Cours processors)


How many hardware performance counters are there on Intel Core 2 and AMD Magny-Cours processors?

There are five counters on the Intel Core 2 processors and four on the AMD Magny-Cours processors (Forge cluster).

Why is PAPIF_flops returning bad numbers for times and MFLOPS? I know they're not correct.

Make sure that you aren't passing in double-precision variables. This might happen if you specify the -r8 flag to the Fortran compiler, for example. PAPIF_flops expects a 32-bit floating point number for the times and MFLOP arguments. Try declaring the variables you pass to PAPIF_flops as real*4.

Are there any utilities that allow me to access the performance counters without modifying or relinking my code?

Yes. Here is a synopsis of these utilities:

Recommended (for ease-of-use)

A command-line utility "psrun", is available on the Forge and Ember clusters. psrun uses PAPI as the underlying support for accessing the performance counters. psrun was developed by the PerfSuite project at NCSA. It supports the option "-h" to access brief online help on usage.

For more detailed information about these tools and their use at NCSA:

You can also check the official PAPI FAQ if we haven't answered your question here.