
I. Preface
PAPI Architecture (Internal Design)
III. How to install PAPI onto your system
IV. C and Fortran Calling Interfaces
V. Events
Initialization of a High-Level API
Reading, Adding, and Stopping Counters
Mflops/s, Real Time, and Processor Time
Initialization of a Low-Level API
Starting, Reading, Adding, and Stopping events in an Event Set
Resetting events in an Event Set
Removing events in an Event Set
Emptying and Destroying an Event Set
VII. PAPI Timers
VIII. PAPI System Information
Initialization of Multiplex Support
Converting an Event Set into a Multiplexed Event Set
Using PAPI with Parallel Programs
Initialization of Thread Support
Beginning Overflows in Event Sets
What is Statistical Profiling?
Converting Error Codes to Error Messages
XII. Appendices
Appendix A. Table of Preset Events
Appendix D. PAPI Supported Platforms
Appendix E. Table of Native Encoding for the Various Platforms
Appendix F. Table of Overhead for the Various Platforms
Appendix G. Table for Multiplexing
Appendix H. Table for Overflow
Appendix I. PAPI Supported Tools
XIII. Bibliography

This document is intended to provide the PAPI user with a
discussion of how to use the different components and functions of PAPI . The
intended users are atyle='font-size:12.0pt'>III. HOW TO INSTALL PAPI ONTO YOUR SYSTEM
This section provides an installation guide for PAPI. It states the necessary steps in order to install PAPI on the various supported operating systems.
This section states the header
files in which function calls are defined and the form of the function calls
for both the C and Fortran calling interfaces. Also, it provides a table that
shows the relation between certain pseudo-types and Fortran variable types.
V. EVENTS
This
section provides an explanation of events as well as an explanation of native
and preset events. The preset query and translation functions are also
discussed in this section. There are code examples using native events, preset
query, and preset translation with the corresponding output.
This section discusses the
high-level and low-level interfaces in detail. The initialization and functions
of these interfaces are also discussed. Code examples along with the
corresponding output are included as well.
VII. PAPI
TIMERS
This section explains the PAPI
functions associated with obtaining real and virtual time from the platform’s
timers. Code examples along with the corresponding output are included as well.
VIII. PAPI SYSTEM INFORMATION
This section explains the PAPI
functions associated with obtaining hardware and executable information. Code
examples along with the corresponding output are included as well.
This
section discusses the advanced features of PAPI, which includes multiplexing,
threads, MPI, overflows, and statistical profiling. The functions that are use
to implement these features are also discussed. Code examples along with the
corresponding output are included as well.
This section discusses the
various negative error codes that are returned by the PAPI functions. A table
with the names, values, and descriptions of the return codes are given as well
as a discussion of the PAPI function that can be used to convert error codes to
error messages along with a code example with the corresponding output.
This section provides
information on PAPI two mailing lists for the users to ask various questions
about the project.
XII. APPENDICES
These appendices provide various listings and tables, such as: a table of preset events and the platforms on which they are supported, a table of PAPI supported tools, more information on native events, multiplexing, overflow, and etc.
handle_error(1)
A function that passes the argument of 1 that the user should write to handle errors.

PAPI is an acronym for Performance Application Programming Interface. The PAPI Project is being developed at the University of Tennessee’s Innovative Computing Laboratory in the Computer Science Department. This project was created to design, standardize, and implement a portable and efficient API (Application Programming Interface) to access the hardware performance counters found on most modern microprocessors.
Hardware counters exist on every major processor today, such as Intel Pentium, IA-64, AMD Athlon, and IBM POWER series. These counters can provide performance tool developers with a basis for tool development and application developers with valuable information about sections of their code that can be improved. However, there are only a few APIs that allow access to these counters, and most of them are poorly documented, unstable, or unavailable. In addition, performance metrics may have different definitions and different programming interfaces on different platforms.
These considerations motivated the development of the PAPI Project. Some goals of the PAPI Project are as follows:
· To provide a solid foundation for cross platform performance analysis tools
· To present a set of standard definitions for performance metrics on all platforms
· To provide a standardize API among users, vendors, and academics
· To be easy to use, well documented, and freely available
The Figure below shows the internal design of the PAPI architecture. In this figure, we can see the two layers of the architecture:
The Portable Layer consists of the API (low level and high level) and machine independent support functions.
The Machine Specific Layer defines and exports a machine independent interface to machine dependent functions and data structures. These functions are defined in the substrate layer, which uses kernel extensions, operating system calls, or assembly language to access the hardware performance counters. PAPI uses the most efficient and flexible of the three, depending on what is available.
PAPI strives to provide a uniform environment across platforms. However, this is not always possible. Where hardware support for features, such as overflows and multiplexing is not supported, PAPI implements the features in software where possible. Also, processors do not support the same metrics, thus you can monitor different events depending on the processor in use. Therefore, the interface remains constant, but how it is implemented can vary. Throughout this guide, implementation decisions will be documented where it can make a difference to the user, such as overhead costs, sampling, and etc.
![]()

On some of the systems that PAPI supports (see Appendix D), you can install PAPI right out of the box without any additional setup. Others require drivers or patches to be installed first.
The general installation steps are below, but first
find your particular Operating System’s section of the /papi/INSTALL file for
current information on any additional steps that may be necessary.
General Installation 1. Pick the appropriate Makefile.<arch> for your system in the papi source distribution, edit it (if necessary) and compile.
% make -f Makefile.<arch> 2. Check for errors. Look for the libpapi.a and libpapi.so in the current directory. Optionally, run the test programs in the ‘ftests’ and ‘tests’ directories.
Not all tests will succeed on all platforms. % ./run_tests.sh This will run the tests in quiet mode, which will print PASSED, FAILED, or SKIPPED. Tests are SKIPPED if the functionality being tested is not supported by that platform. 3. Create a PAPI binary distribution or install PAPI directly.
To directly install PAPI from the build tree: % make -f Makefile.<arch> DESTDIR=<install-dir> install Please use an absolute pathname for <install-dir>, not a relative pathname. To create a binary kit, papi-<arch>.tgz: % make -f Makefile.<arch> dist
PAPI is written in C. The function calls stops:.5in'>The function calls in the Fortran interface are defined in the header file, fpapi.h and consist of the following form:
PAPIF_function_name(arg1, arg2, …, check)
As you can probably see, the C
function calls have equivalent Fortran function calls (PAPI_<call>
becomes PAPIF_<call>). Well, this is true for most function calls, except
for the functions that return C pointers to structures, such as PAPI_get_opt and
PAPI_get_executable_info, which are either not implemented in the Fortran
interface, or implemented with different calling semantics. In the function
calls of the Fortran interface, the return code of the corresponding C routine
is returned in the argument, check.
For most
architectures, the following relation holds between the pseudo-types listed and
Fortran variable types:
|
Pseudo-type |
Fortran type |
Description |
|
C_INT |
INTEGER |
Default
Integer type |
|
C_FLOAT |
REAL |
Default
Real type |
|
C_LONG_LONG |
INTEGER*8 |
Extended
size integer |
|
C_STRING |
CHARACTER*(PAPI_MAX_STR_LEN) |
Fortran
string |
|
C_INT
FUNCTION |
EXTERNAL
INTEGER FUNCTION |
Fortran
function returning integer result |
Array
arguments must be of sufficient size to hold the input/output from/to the
subroutine for predictable behavior. The array length is indicated either by
the accompanying argument or by internal PAPI definitions.
Subroutines
accepting C_STRING as an argument are on most implementations capable of
reading the character string length as provided by Fortran. In these
implementations, the string is truncated or space padded as necessary. For
other implementations, the length of the character array is assumed to be of
sufficient size. No character string longer than PAPI_MAX_STR_LEN is
returned by the PAPIF interface.
For more information on all of the function calls and their job descriptions, see Appendix B for the high-level functions and Appendix C for the low-level functions.
Events are occurrences of specific signals related to a processor’s function. Hardware performance counters exist as a small set of registers that count events, such as cache misses and floating point operations while the program executes on the processor. Monitoring these events facilitates correlation between the structure of source/object code and the efficiency of the mapping of that code to the underlying architecture. Each processor has a number of events that are native to and often to that architecture. PAPI provides a software abstraction of these architecture-dependent native events into a collection of preset events that are accessible through the PAPI interface.
Native events comprise the set of all events that are countable by the CPU. In many cases, these events will be available through a matching preset PAPI event. Even if no preset event is available native events can still be accessed directly. These events are intended to be used by people who are very familiar with the particular platform in use. PAPI provides access to native events on all supported platforms through the low-level interface. Native events use the same interface as used when setting up a preset event, but a CPU-specific bit pattern is used instead of the PAPI event definition.
Native encoding is usually:
((register code & 0xffffff) << 8 | (register number & 0xff))
Native encodings are platform dependent, so the above native encoding may or may not work with your platform. To determine the native encoding for your platform, see Appendix F or the README file for your platform in the PAPI source distribution. In addition, the native event lists for the various platforms can be found in the processor architecture manual.
Native events are specified as arguments to the low-level function, PAPI_add_event. In the following code example, a native event is added by using PAPI_add_event with the register code = 0x800000 and the register number = 0x01:

For more code examples, see tests/native.c in the papi source distribution.
Preset events, also known as predefined events, are a common set of events deemed relevant and useful for application performance tuning. These events are typically found in many CPUs that provide performance counters and give access to the memory hierarchy, cache coherence protocol events, cycle and instruction counts, functional unit, and pipeline status. Furthermore, preset events are mappings from symbolic names (PAPI preset name) to machine specific definitions (native countable events) for a particular hardware resource. For example, Total Cycles (in user mode) is PAPI_TOT_CYC. Also, PAPI supports presets that may be derived from the underlying hardware metrics. For example, Floating Point Instructions per Second is PAPI_FLOPS. A preset can be either directly available as a single counter, derived using a combination of counters, or unavailable on any particular platform.
The PAPI library names approximately 100 preset events, which are defined in the header file, papiStdEventDefs.h. For a given platform, a subset of these preset events can be counted though either a simple high-level programming interface or a more complete C or Fortran low-level interface. For a list and a job description of all the preset events, see Appendix A.
The exact semantics of an event counter are platform dependent. PAPI preset names are mapped onto available events in a way, so it can count as many similar types of events as possible on different platforms. Due to hardware implementation differences, it is not necessarily feasible to directly compare the counts of a particular PAPI event obtained on different hardware platforms. To determine which preset events are available on a specific platform, see Appendix E or run tests/avail.c in the papi source distribution.
The following low-level functions can be called to query about the existence of a preset (in other words, if the hardware supports that certain preset), to query details about a PAPI event, or to acquire details about all PAPI events, respectively:
C:
PAPI_query_event(EventCode)
PAPI_query_event_verbose(EventCode, info)
PAPI_query_all_events_verbose()
Fortran:
PAPIF_query_event(EventCode, check)
PAPIF_query_event_verbose(EventCode, EventName, EventDescr, EventLabel, avail, EventNote, flags, check)
EventCode -- a defined event, such as PAPI_TOT_INS.
EventName -- the event name, such as the preset
name, PAPI_BR_CN.
EventDescr -- a descriptive string for the event of
length less than PAPI_MAX_STR_LEN.
EventLabel -- a short descriptive label for the
event of length less than 18 characters.
avail -- zero if the event CANNOT be counted.
EventNote -- additional text information about an
event (if available).
flags -- provides additional information about
an event, e.g., PAPI_DERIVED for an event derived from 2 or more other
events.
Note
that PAPI_query_all_events_verbose is not implemented in Fortran because it
returns a C pointer to an array of C structures.
PAPI_query_event asks the PAPI library if the PAPI Preset event can be counted on this architecture. If the event CAN be counted, the function returns PAPI_OK. If the event CANNOT be counted, the function returns an error code. On some platforms, this function also can be used to check the syntax of a native event.
PAPI_query_event_verbose asks the PAPI library for a copy of an event descriptor. This descriptor can then be used to investigate the details about the event. In Fortran, the individual fields in the descriptor are returned as parameters.
PAPI_query_all_events_verbose asks the PAPI library to return a pointer to an array of event descriptors. The number of objects in the array is PAPI_MAX_PRESET_EVENTS and each object is a descriptor as returned by PAPI_query_event_verbose().