|
  PAPIC:Overview
| |||||||||||||||||||||||||||||||||||||||||||||||||||||
ViewsFrom PAPIDocs
Intended AudienceWelcome to PAPI, the Performance API. This overview will provide you with a discussion of how to use the different components and functions of PAPI. The intended audience includes application developers, performance tool writers, and curious students of performance who wish to access performance data to tune and model application performance. You should have some level of familiarity with C and Fortran, and have a basic knowledge of computer architecture and programming. Introduction to PAPIPAPI is an acronym for Performance Application Programming Interface. The PAPI project originated at the University of Tennessee’s Innovative Computing Laboratory, now part of the Department of Electrical Engineering and Computer Science. This project was initiated more than a decade ago to design, standardize, and implement a portable and efficient API to access the hardware performance counters found on modern microprocessors. With the introduction of Component PAPI, or PAPI-C in early 2010, PAPI has extended its reach beyond the CPU and can now monitor system information on a range of components from CPUs to network interface cards to power monitors and more. If you're already familiar with PAPI, you may be interested in this summary of the differences between PAPI and PAPI-C. BackgroundHardware performance counters exist on every major processor today. These counters can provide performance tool developers with a basis for tool development and application developers with valuable information about sections of their code that can be improved. However, there are only a few APIs that allow access to these counters, and many of them are poorly documented, unstable, or unavailable. In addition, performance metrics may have different definitions and different programming interfaces on different platforms. These considerations motivated the development of the PAPI project. Some goals of the PAPI Project are as follows:
ArchitectureThe Figure below shows the internal design of the PAPI architecture. In this figure, we can see two layers of the architecture: The Framework Layer consists of the API (low level and high level) and machine independent support functions. The Component Layer defines and exports a machine independent interface to machine dependent functions and data structures. These functions are defined in the components, which may use kernel extensions, operating system calls, or assembly language to access the hardware performance counterson a variety of subsystems. PAPI uses the most efficient and flexible of the three, depending on what is available. PAPI intends to provide a uniform environment across platforms. However, this is not always possible. Where hardware support for features, such as overflows and multiplexing is not supported, PAPI implements the features in software where possible. Also, processors do not support the same metrics, thus you can monitor different events depending on the processor in use. Therefore, the interface remains constant, but how it is implemented can vary. Throughout this overview, implementation decisions will be documented where it can make a difference to the user, such as overhead costs, sampling, and etc. Installing PAPIOn many systems, including recent Linux kernels (2.6.32+), you can install PAPI right out of the box without any additional setup. Others require drivers or patches to be installed first. A tarball of the latest version of PAPI can always be found on the Software page of the PAPI website. Because installation instructions vary from platform to platform, please find your particular Operating System and hardware section in the current INSTALL file for information on exactly how to install PAPI for your configuration. C and Fortran Calling InterfacesPAPI is written in C. The function calls in the C interface are defined in the header file, papi.h and consist of the following form: <returned data type> PAPI_function_name(arg1, arg2, …) The function calls in the Fortran interface are defined in the header file, fpapi.h and consist of the following form: PAPIF_function_name(arg1, arg2, …, check) As you can see, the C function calls have equivalent Fortran function calls (PAPI_<call> becomes PAPIF_<call>). This is generally true for most function calls, except for the functions that return C pointers to structures, such as PAPI_get_opt and PAPI_get_executable_info, which are either not implemented in the Fortran interface, or implemented with different calling semantics. In the function calls of the Fortran interface, the return code of the corresponding C routine is returned in the argument, check. For most architectures, the following relation holds between the pseudo-types listed and Fortran variable types:
Array arguments must be of sufficient size to hold the input/output from/to the subroutine for predictable behavior. The array length is indicated either by the accompanying argument or by internal PAPI definitions. Subroutines accepting C_STRING as an argument are on most implementations capable of reading the character string length as provided by Fortran. In these implementations, the string is truncated or space padded as necessary. For other implementations, the length of the character array is assumed to be of sufficient size. No character string longer than PAPI_MAX_STR_LEN is returned by the PAPIF interface. Example CodeThroughout this overview are a number of blocks of example code. It is our intention that this code will be executable by simply copying it into a file, compiling it, and linking it to the PAPI library. Many code blocks will reference an external error handling function called
#include <stdlib.h>
#include <stdio.h>
#include <papi.h>
void handle_error (int retval)
{
printf("PAPI error %d: %s\n", retval, PAPI_strerror(retval));
exit(1);
}
We have developed and tested these examples assuming a linux and gcc toolchain. Your environment may differ and require appropriate adaptation. To compile the above error handler, assuming that the file containing it is in the same directory as papi.h, use a command line similar to: gcc -I. -c handle_error.c To compile a test program under the same conditions, use a command line like: gcc -I. example.c handle_error.o libpapi.a -o example If you encounter example code that will not compile and run, please let us know. Keeping our examples up to date is an ongoing process. EventsPAPI counts events that occur on a cpu or other subsystem. There are usually more events to be measured than counter registers to count them in, so PAPI also provides the means to map events to counters. To learn more about events, click here, or on the title above. In addition to the events that are native to each component, PAPI defines a set of preset events that are standardized across all cpu components. To facilitate the discovery of supported events, PAPI provides query functions to inquire about the availability of specified events. Events are often referred to by name, but internally PAPI uses an opaque code to specify an event. Translation functions are provided to convert between names and codes. For convenience, event codes for a specific component can be collected into event sets. A variety of functions are available to manage event sets. Additionally, a number of options can be set, either for the behavior of the whole library, or for an individual event set. All of these features are described in greater detail below. Native EventsNative events comprise the set of all events that are available for a specific component. For cpus, there are generally far more native events available than can be mapped onto PAPI preset events. For other components, native events are generally the only option available. Click here, or on the title above for more information on native events and examples of their use. Preset EventsPreset events, also known as predefined events, are a common set of cpu events deemed relevant and useful for application performance tuning. PAPI defines a set of about 100 preset events for cpus, which can be found here. A given cpu will implement a subset of those, often no more than several dozen. Although the names and calling semantics of preset events are standardized across platforms, the exact definitions are determined by the underlying hardware. Caveat emptor. For more details on preset events and examples of their use, click here, or on the title above. Event QuerySeveral low-level functions can be called to leanr more about preset or native events. PAPI_query_event returns a TRUE or FALSE to indicate if a given event is implemented on a given platform; PAPI_get_event_info returns a structure containing information about a specific event; and PAPI_enum_event returns the next event in a sequence given the event code of a specific event. This function is useful for enumerating over a list of events. For more details on these functions and examples of their use, click here, or on the title above. Event TranslationA preset or native event can be referenced by name or by event code. Most PAPI functions require an event code, while most user input and output is in terms of names. Two low-level functions are provided to translate between these formats. They are discussed with usage examples here or by clicking on the title above. Event SetsEvent Sets are user-defined collections of hardware events (preset or native), which are measured together to provide meaningful information. Events in an Event Set must all belong to a single component. Multiple Event Sets can be defined at the same time, but only one per component can be active. For details on managing Event Sets, including function calls and example code, click here or on the title above. Getting and Setting OptionsThere are a number of options that can globally affect the operation of the entire PAPI library or locally affect a specific event set. These options can be reviewed and set by calling a pair of low-level functions, as decsribed in more detail here and via the title above. PAPI Counter InterfacesHigh Level APIThe high level API (Application Programming Interface) lets you start, stop, and read the counters for PRESET events on the cpu only. It is designed for simplicity, not flexibility. For more details on the 8 functions available in the High Level API, click here or on the title above. Low Level APIThe low-level API (Application Programming Interface) manages hardware events in user-defined groups called Event Sets. It is meant for experienced application programmers and tool developers wanting fine-grained measurement and control of the PAPI interface. It provides access to both PAPI preset and native events, and supports all installed components. For more details on the Low Level API, click here or on the title above. PAPI TimersPAPI provides four functions to measure time in microseconds or cycles for either real (wall clock) time or virtual (process) time. These timers use the most accurate timers available on the platform in use. More information on these routines can be found here or by clicking the title above. PAPI System InformationThis section explains the PAPI functions associated with obtaining hardware and executable information. Code examples along with the corresponding output are included as well. Advanced PAPI FeaturesPAPI supports a number of advanced features beyond simple event counting. You can learn more about these advanced topics by following the title links below. MultiplexingHardware Performance Counters are generally a scare resource. There are often many more events of interest than counters to count them on. Multiplexing is one way around this dilemma. It doesn't come without trade-offs. Click here or the title above to learn more. Parallel ProgrammingPAPI can be used with parallel as well as serial programs. For a discussion of issues that come up in threaded or multiprocess environments, click here or the title above. OverflowMost processors can generate an interrupt when a performance counter exceeds a threshold value. PAPI allows you to attach an interrupt handler to that occurrence so you can perform periodic activities where the period is determined by an event other than time. Learn more by clicking here or the title above. Statistical ProfilingBy using the overflow capabilities of PAPI, it is possible to create profiles of the distribution of various performance events across a selected address space. Learn more by clicking here or on the title above. Address RestrictionSome processors, particularly the Intel Itanium, offer the capability to restrict the address range over which performance events can be generated. This allows, for example, the ability to collect cache miss events for individual data structures. Learn more here or by clicking the title above. PAPI Error HandlingSometimes things don't go as planned. Most PAPI routines will tell you when that happens. It's always a good idea to check if things worked and let someone know if they didn't. To learn more about the return codes that PAPI provides, and how to turn them into meaningful messages, click here or the title above. Many of the code snippets in this Overview and in the PAPI man pages refer to a routine called handle_error. One possible implementation of this routine is shown here. PAPI UtilitiesA collection of simple utility commands is available in the src/utils directory. See individual utilities for details on usage.
Additional PAPI InformationA great deal of additional information is available on the PAPI website. Some of it is accessible through the links below.
|