PAPITopics:Getting Started
From PAPIDocs
Jump to: navigation, search

Introduction to PAPI

PAPI, or the Performance Application Programming Interface is a machine independent set of callable routines that provide access to the performance counters on most modern processors. It is pre-installed on several machines available through eecs, including hydra, arc, and VNC. The illustration in this exercise will work best on arc (Intel Core2) or VNC (Intel Nehalem). If you're intrepid, you can also install PAPI on your Linux laptop. See the PAPI installation instructions for details.

For further information about PAPI, check the PAPI home page or documentation page.

PAPI High Level Calls

PAPI is implemented in layers. The top layer consists of ten calls which provide a simple interface to PAPI functionality for many applications. An overview of these ten functions can be found in the High Level API section of the PAPI doxygen pages. A single high level call, PAPI_flops, is all that will be needed for this illustration.

Using PAPI to Measure Execution Time

As described in the PAPI_flops documentation, a call to PAPI_flops returns four parameters, discussed below:

  • rtime -- total real time in seconds since the first PAPI_flops() call
  • ptime -- total process time in seconds since the first PAPI_flops() call
  • flpops -- total floating point operations since the first PAPI_flops() call
  • mflops -- Mflops/s achieved since the latest PAPI_flops() call

The values of rtime and ptime are equivalent to those produced by the low level PAPI functions PAPI_get_real_usec and PAPI_get_virt_usec. You can stop the counters used by PAPI_flops with a call to PAPI_stop_counters. The next call to PAPI_flops will start over with fresh values for all returned parameters.

The Source Code

To illustrate the use of PAPI_flops for performance measurement, we provide a simple C routine to multiply two matrices. The source code can be found in the box below. Copy and paste it into a file named PAPI_flops.c. Note that all programs that use PAPI must #include papi.h.

 * A simple example of the use of the high level PAPI_flops call.
 * PAPI_flops measures elapsed time, process time, floating point
 * instructions and MFLOP/s for code bracketted by calls to this routine.
 * For the following matrix multiply you should get 2*(INDEX^3) flpins 
 * on Intel Pentium processors.

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <sys/types.h>
#include <memory.h>
#include <malloc.h>
#include "papi.h"

#define INDEX 100

static void test_fail(char *file, int line, char *call, int retval);

int main(int argc, char **argv) {
  extern void dummy(void *);
  float matrixa[INDEX][INDEX], matrixb[INDEX][INDEX], mresult[INDEX][INDEX];
  float real_time, proc_time, mflops;
  long long flpins;
  int retval;
  int i,j,k;

  /* Initialize the Matrix arrays */
  for ( i=0; i<INDEX*INDEX; i++ ){
    mresult[0][i] = 0.0;
    matrixa[0][i] = matrixb[0][i] = rand()*(float)1.1; }

  /* Setup PAPI library and begin collecting data from the counters */
  if((retval=PAPI_flops( &real_time, &proc_time, &flpins, &mflops))<PAPI_OK)
    test_fail(__FILE__, __LINE__, "PAPI_flops", retval);

  /* Matrix-Matrix multiply */
  for (i=0;i<INDEX;i++)
      mresult[i][j]=mresult[i][j] + matrixa[i][k]*matrixb[k][j];

  /* Collect the data into the variables passed in */
  if((retval=PAPI_flops( &real_time, &proc_time, &flpins, &mflops))<PAPI_OK)
    test_fail(__FILE__, __LINE__, "PAPI_flops", retval);

  printf("Real_time:\t%f\nProc_time:\t%f\nTotal flpins:\t%lld\nMFLOPS:\t\t%f\n",
  real_time, proc_time, flpins, mflops);
  printf("%s\tPASSED\n", __FILE__);

static void test_fail(char *file, int line, char *call, int retval){
    printf("%s\tFAILED\nLine # %d\n", file, line);
    if ( retval == PAPI_ESYS ) {
        char buf[128];
        memset( buf, '\0', sizeof(buf) );
        sprintf(buf, "System error in %s:", call );
    else if ( retval > 0 ) {
        printf("Error calculating: %s\n", call );
    else {
        printf("Error in %s: %s\n", call, PAPI_strerror(retval) );

Running This Example

To try out this example, log on to a machine on which PAPI is installed. Save the code above into a file called PAPI_flops.c into your home area. Execute the following command line to compile and link this test:

UNIX> gcc -I/usr/local/include -O0 PAPI_flops.c /usr/local/lib/libpapi.a -o PAPI_flops

When you run the program, you should get output similar to the following (this was on; your mileage may vary):

UNIX> PAPI_flops
Real_time:	0.007456
Proc_time:	0.007423
Total flpins:	2000214
MFLOPS:		269.445282

Programming on Your Own

To use PAPI_flops in your own code, you can either modify the source code above to suit your needs, or copy the relevant pieces to code you have already written. Make sure to #include "papi.h" and remember that a -1 value in flpins will reset the counters. Experiment with the make line to suit your needs.

Notes for Fortran

You can refer to the PAPI_flops documentation to get the exact calling syntax for Fortran. You can also refer to the PAPI Fortran page for more general information on calling PAPI routines from Fortran. Remember that the Fortran calls have an extra check parameter at the end to pass back error status. Also keep in mind that a long long value in C (64-bit integer) is an INTEGER*8 in Fortran, and a float in C is a REAL in Fortran. A sample command line to compile and link the Fortran program foo might look like this:

UNIX> f77 foo.f /usr/local/lib/libpapi.a -o foo.out