PAPIC:Overview
From PAPIDocs
Jump to: navigation, search

Contents

Intended Audience

Welcome to PAPI, the Performance API. This overview will provide you with a discussion of how to use the different components and functions of PAPI. The intended audience includes application developers, performance tool writers, and curious students of performance who wish to access performance data to tune and model application performance. You should have some level of familiarity with C and Fortran, and have a basic knowledge of computer architecture and programming.

Introduction to PAPI

PAPI is an acronym for Performance Application Programming Interface. The PAPI project originated at the University of Tennessee’s Innovative Computing Laboratory, now part of the Department of Electrical Engineering and Computer Science. This project was initiated more than a decade ago to design, standardize, and implement a portable and efficient API to access the hardware performance counters found on modern microprocessors. With the introduction of Component PAPI, or PAPI-C in early 2010, PAPI has extended its reach beyond the CPU and can now monitor system information on a range of components from CPUs to network interface cards to power monitors and more.

If you're already familiar with PAPI, you may be interested in this summary of the differences between PAPI and PAPI-C.

Background

Hardware performance counters exist on every major processor today. These counters can provide performance tool developers with a basis for tool development and application developers with valuable information about sections of their code that can be improved. However, there are only a few APIs that allow access to these counters, and many of them are poorly documented, unstable, or unavailable. In addition, performance metrics may have different definitions and different programming interfaces on different platforms.

These considerations motivated the development of the PAPI project. Some goals of the PAPI Project are as follows:

  • To provide a solid foundation for cross platform performance analysis tools
  • To present a set of standard definitions for performance metrics on all platforms
  • To provide a standardize API among users, vendors, and academics
  • To be easy to use, well documented, and freely available

Architecture

The Figure below shows the internal design of the PAPI architecture. In this figure, we can see two layers of the architecture:

PAPI Architecture.png

The Framework Layer consists of the API (low level and high level) and machine independent support functions.

The Component Layer defines and exports a machine independent interface to machine dependent functions and data structures. These functions are defined in the components, which may use kernel extensions, operating system calls, or assembly language to access the hardware performance counterson a variety of subsystems. PAPI uses the most efficient and flexible of the three, depending on what is available.

PAPI intends to provide a uniform environment across platforms. However, this is not always possible. Where hardware support for features, such as overflows and multiplexing is not supported, PAPI implements the features in software where possible. Also, processors do not support the same metrics, thus you can monitor different events depending on the processor in use. Therefore, the interface remains constant, but how it is implemented can vary. Throughout this overview, implementation decisions will be documented where it can make a difference to the user, such as overhead costs, sampling, and etc.

Installing PAPI

On many systems, including recent Linux kernels (2.6.32+), you can install PAPI right out of the box without any additional setup. Others require drivers or patches to be installed first.

A tarball of the latest version of PAPI can always be found on the Software page of the PAPI website.

Because installation instructions vary from platform to platform, please find your particular Operating System and hardware section in the current INSTALL file for information on exactly how to install PAPI for your configuration.

C and Fortran Calling Interfaces

PAPI is written in C. The function calls in the C interface are defined in the header file, papi.h and consist of the following form:

 
<returned data type> PAPI_function_name(arg1, arg2, …)
 

The function calls in the Fortran interface are defined in the header file, fpapi.h and consist of the following form:

 
PAPIF_function_name(arg1, arg2, …, check) 
 

As you can see, the C function calls have equivalent Fortran function calls (PAPI_<call> becomes PAPIF_<call>). This is generally true for most function calls, except for the functions that return C pointers to structures, such as PAPI_get_opt and PAPI_get_executable_info, which are either not implemented in the Fortran interface, or implemented with different calling semantics. In the function calls of the Fortran interface, the return code of the corresponding C routine is returned in the argument, check.

For most architectures, the following relation holds between the pseudo-types listed and Fortran variable types:

Pseudo-type Fortran type Description
C_INT INTEGER Default Integer type
C_FLOAT REAL Default Real type
C_LONG_LONG INTEGER*8 Extended size integer
C_STRING CHARACTER*(PAPI_MAX_STR_LEN) Fortran string
C_INT FUNCTION EXTERNAL INTEGER FUNCTION Fortran function returning integer result

Array arguments must be of sufficient size to hold the input/output from/to the subroutine for predictable behavior. The array length is indicated either by the accompanying argument or by internal PAPI definitions.

Subroutines accepting C_STRING as an argument are on most implementations capable of reading the character string length as provided by Fortran. In these implementations, the string is truncated or space padded as necessary. For other implementations, the length of the character array is assumed to be of sufficient size. No character string longer than PAPI_MAX_STR_LEN is returned by the PAPIF interface.

Example Code

Throughout this overview are a number of blocks of example code. It is our intention that this code will be executable by simply copying it into a file, compiling it, and linking it to the PAPI library. Many code blocks will reference an external error handling function called handle_error(). One implementation of such a function is shown below:

 
#include <stdlib.h>
#include <stdio.h>
#include <papi.h>

void handle_error (int retval)
{
     printf("PAPI error %d: %s\n", retval, PAPI_strerror(retval));
     exit(1);
}

We have developed and tested these examples assuming a linux and gcc toolchain. Your environment may differ and require appropriate adaptation. To compile the above error handler, assuming that the file containing it is in the same directory as papi.h, use a command line similar to:

gcc -I. -c handle_error.c

To compile a test program under the same conditions, use a command line like:

gcc -I. example.c handle_error.o libpapi.a -o example

If you encounter example code that will not compile and run, please let us know. Keeping our examples up to date is an ongoing process.

Events

PAPI counts events that occur on a cpu or other subsystem. There are usually more events to be measured than counter registers to count them in, so PAPI also provides the means to map events to counters. To learn more about events, click here, or on the title above.

In addition to the events that are native to each component, PAPI defines a set of preset events that are standardized across all cpu components. To facilitate the discovery of supported events, PAPI provides query functions to inquire about the availability of specified events. Events are often referred to by name, but internally PAPI uses an opaque code to specify an event. Translation functions are provided to convert between names and codes. For convenience, event codes for a specific component can be collected into event sets. A variety of functions are available to manage event sets. Additionally, a number of options can be set, either for the behavior of the whole library, or for an individual event set.

All of these features are described in greater detail below.

Native Events

Native events comprise the set of all events that are available for a specific component. For cpus, there are generally far more native events available than can be mapped onto PAPI preset events. For other components, native events are generally the only option available. Click here, or on the title above for more information on native events and examples of their use.

Preset Events

Preset events, also known as predefined events, are a common set of cpu events deemed relevant and useful for application performance tuning. PAPI defines a set of about 100 preset events for cpus, which can be found here. A given cpu will implement a subset of those, often no more than several dozen. Although the names and calling semantics of preset events are standardized across platforms, the exact definitions are determined by the underlying hardware. Caveat emptor. For more details on preset events and examples of their use, click here, or on the title above.

Event Query

Several low-level functions can be called to leanr more about preset or native events.

PAPI_query_event returns a TRUE or FALSE to indicate if a given event is implemented on a given platform;

PAPI_get_event_info returns a structure containing information about a specific event; and

PAPI_enum_event returns the next event in a sequence given the event code of a specific event. This function is useful for enumerating over a list of events.

For more details on these functions and examples of their use, click here, or on the title above.

Event Translation

A preset or native event can be referenced by name or by event code. Most PAPI functions require an event code, while most user input and output is in terms of names. Two low-level functions are provided to translate between these formats. They are discussed with usage examples here or by clicking on the title above.

Event Sets

Event Sets are user-defined collections of hardware events (preset or native), which are measured together to provide meaningful information. Events in an Event Set must all belong to a single component. Multiple Event Sets can be defined at the same time, but only one per component can be active. For details on managing Event Sets, including function calls and example code, click here or on the title above.

Getting and Setting Options

There are a number of options that can globally affect the operation of the entire PAPI library or locally affect a specific event set. These options can be reviewed and set by calling a pair of low-level functions, as decsribed in more detail here and via the title above.

PAPI Counter Interfaces

High Level API

The high level API (Application Programming Interface) lets you start, stop, and read the counters for PRESET events on the cpu only. It is designed for simplicity, not flexibility. For more details on the 8 functions available in the High Level API, click here or on the title above.

Low Level API

The low-level API (Application Programming Interface) manages hardware events in user-defined groups called Event Sets. It is meant for experienced application programmers and tool developers wanting fine-grained measurement and control of the PAPI interface. It provides access to both PAPI preset and native events, and supports all installed components. For more details on the Low Level API, click here or on the title above.

PAPI Timers

PAPI provides four functions to measure time in microseconds or cycles for either real (wall clock) time or virtual (process) time. These timers use the most accurate timers available on the platform in use. More information on these routines can be found here or by clicking the title above.

PAPI System Information

This section explains the PAPI functions associated with obtaining hardware and executable information. Code examples along with the corresponding output are included as well.

Advanced PAPI Features

PAPI supports a number of advanced features beyond simple event counting. You can learn more about these advanced topics by following the title links below.

Multiplexing

Hardware Performance Counters are generally a scare resource. There are often many more events of interest than counters to count them on. Multiplexing is one way around this dilemma. It doesn't come without trade-offs. Click here or the title above to learn more.

Parallel Programming

PAPI can be used with parallel as well as serial programs. For a discussion of issues that come up in threaded or multiprocess environments, click here or the title above.

Overflow

Most processors can generate an interrupt when a performance counter exceeds a threshold value. PAPI allows you to attach an interrupt handler to that occurrence so you can perform periodic activities where the period is determined by an event other than time. Learn more by clicking here or the title above.

Statistical Profiling

By using the overflow capabilities of PAPI, it is possible to create profiles of the distribution of various performance events across a selected address space. Learn more by clicking here or on the title above.

Address Restriction

Some processors, particularly the Intel Itanium, offer the capability to restrict the address range over which performance events can be generated. This allows, for example, the ability to collect cache miss events for individual data structures. Learn more here or by clicking the title above.

PAPI Error Handling

Sometimes things don't go as planned. Most PAPI routines will tell you when that happens. It's always a good idea to check if things worked and let someone know if they didn't. To learn more about the return codes that PAPI provides, and how to turn them into meaningful messages, click here or the title above.

Many of the code snippets in this Overview and in the PAPI man pages refer to a routine called handle_error. One possible implementation of this routine is shown here.

PAPI Utilities

A collection of simple utility commands is available in the src/utils directory. See individual utilities for details on usage.

Utility Name Description
papi_avail provides availability and detail information for PAPI preset events
papi_clockres provides availability and detail information for PAPI preset events
papi_cost provides availability and detail information for PAPI preset events
papi_command_line executes PAPI preset or native events from the command line
papi_decode decodes PAPI preset events into a csv format suitable for PAPI_encode_events
papi_event_chooser given a list of named events, lists other events that can be counted with them
papi_mem_info provides information on the memory architecture of the current processor
papi_native_avail provides detailed information for PAPI native events

Additional PAPI Information

A great deal of additional information is available on the PAPI website. Some of it is accessible through the links below.

Home Page The home page of the PAPI project website
Mailing Lists How to subscribe to the general and developers mailing lists
Forum A place where users can help each other
CVS Tree Interactively view the PAPI source code or download a snapshot of the latest codebase
Doxygen Documentation A complete overview of PAPI from a programmers perspective
Software Downloads Recent releases of PAPI
Supported Platforms Where will PAPI run?
Bibliography Recent articles about PAPI