Text Box: PAPI USER’S GUIDE
TABLE OF CONTENTS

I.                    Preface

Intended Audience

Organization of This Document

Document Convention

II.                 Introduction to PAPI

What is PAPI?

PAPI Background/Motivation

PAPI Architecture (Internal Design)

III.               How to install PAPI onto your system

IV.       C and Fortran Calling Interfaces

V.        Events

What are Events?

            Native Events

What are Native Events?

Preset Events

What are Preset Events?

Preset Query

Preset Translation

VI.              PAPI’S Counter Interfaces

High-Level API

What is a High-Level API?

Initialization of a High-Level API

Reading, Adding, and Stopping Counters

Mflops/s, Real Time, and Processor Time

Low-Level API

What is a Low-Level API?

Initialization of a Low-Level API

Event Sets

What are Event Sets?

Creating an Event Set

Adding events to an Event Set

Starting, Reading, Adding, and Stopping events in an Event Set

Resetting events in an Event Set

Removing events in an Event Set

Emptying and Destroying an Event Set

The State of an Event Set

Getting and Setting Options

Simple Code Examples

High-Level API

Low-Level API

VII.            PAPI Timers

Real Time

Virtual Time

VIII.     PAPI System Information

Executable Information

Hardware Information

IX.       Advanced PAPI Features

Multiplexing

What is Multiplexing?

Using PAPI with Multiplexing

Initialization of Multiplex Support

Converting an Event Set into a Multiplexed Event Set

Issues of Multiplexing

Using PAPI with Parallel Programs

Threads

What are Threads?

Initialization of Thread Support

Thread ID

MPI

Overflow

What is an Overflow?

Beginning Overflows in Event Sets

Address of the Overflow

Statistical Profiling

What is Statistical Profiling?

Generating a PC Histogram

X.                 PAPI Error Handling

Error Codes

Converting Error Codes to Error Messages

XI.              PAPI Mailing Lists

XII.      Appendices

Appendix A. Table of Preset Events

Appendix B. High-Level API

Appendix C. Low-Level API

Appendix D. PAPI Supported Platforms

                        Appendix E. Table of Native Encoding for the Various Platforms

Appendix F. Table of Overhead for the Various Platforms

Appendix G. Table for Multiplexing

Appendix H.  Table for Overflow

Appendix I.  PAPI Supported Tools 

XIII.         Bibliography


Text Box: PREFACE
INTENDED AUDIENCE

This document is intended to provide the PAPI user with a discussion of how to use the different components and functions of PAPI . The intended users are atyle='font-size:12.0pt'>III. HOW TO INSTALL PAPI ONTO YOUR SYSTEM

 

This section provides an installation guide for PAPI. It states the necessary steps in order to install PAPI on the various supported operating systems.

 

IV. C AND FORTRAN CALLING INTERFACES

This section states the header files in which function calls are defined and the form of the function calls for both the C and Fortran calling interfaces. Also, it provides a table that shows the relation between certain pseudo-types and Fortran variable types.

V. EVENTS

This section provides an explanation of events as well as an explanation of native and preset events. The preset query and translation functions are also discussed in this section. There are code examples using native events, preset query, and preset translation with the corresponding output.

VI. PAPI COUNTER INTERFACES

This section discusses the high-level and low-level interfaces in detail. The initialization and functions of these interfaces are also discussed. Code examples along with the corresponding output are included as well.

VII. PAPI TIMERS

This section explains the PAPI functions associated with obtaining real and virtual time from the platform’s timers. Code examples along with the corresponding output are included as well.

VIII. PAPI SYSTEM INFORMATION

This section explains the PAPI functions associated with obtaining hardware and executable information. Code examples along with the corresponding output are included as well.

IX. ADVANCED PAPI FEATURES

This section discusses the advanced features of PAPI, which includes multiplexing, threads, MPI, overflows, and statistical profiling. The functions that are use to implement these features are also discussed. Code examples along with the corresponding output are included as well.

X. PAPI ERROR HANDLING

This section discusses the various negative error codes that are returned by the PAPI functions. A table with the names, values, and descriptions of the return codes are given as well as a discussion of the PAPI function that can be used to convert error codes to error messages along with a code example with the corresponding output.

XI. PAPI MAILING LISTS

This section provides information on PAPI two mailing lists for the users to ask various questions about the project.

XII. APPENDICES

These appendices provide various listings and tables, such as:  a table of preset events and the platforms on which they are supported, a table of PAPI supported tools, more information on native events, multiplexing, overflow, and etc.

DOCUMENT CONVENTION

handle_error(1)

 

            A function that passes the argument of 1 that the user should write to handle errors.

 


Text Box: INTRODUCTION TO PAPI
WHAT IS PAPI?

PAPI is an acronym for Performance Application Programming Interface. The PAPI Project is being developed at the University of Tennessee’s Innovative Computing Laboratory in the Computer Science Department. This project was created to design, standardize, and implement a portable and efficient API (Application Programming Interface) to access the hardware performance counters found on most modern microprocessors.

BACKGROUND

Hardware counters exist on every major processor today, such as Intel Pentium, IA-64, AMD Athlon, and IBM POWER series. These counters can provide performance tool developers with a basis for tool development and application developers with valuable information about sections of their code that can be improved. However, there are only a few APIs that allow access to these counters, and most of them are poorly documented, unstable, or unavailable. In addition, performance metrics may have different definitions and different programming interfaces on different platforms.

These considerations motivated the development of the PAPI Project.  Some goals of the PAPI Project are as follows:

· To provide a solid foundation for cross platform performance analysis tools

· To present a set of standard definitions for performance metrics on all platforms

· To provide a standardize API among users, vendors, and academics

· To be easy to use, well documented, and freely available

ARCHITECTURE

The Figure below shows the internal design of the PAPI architecture. In this figure, we can see the two layers of the architecture:

The Portable Layer consists of the API (low level and high level) and machine independent support functions.

The Machine Specific Layer defines and exports a machine independent interface to machine dependent functions and data structures. These functions are defined in the substrate layer, which uses kernel extensions, operating system calls, or assembly language to access the hardware performance counters. PAPI uses the most efficient and flexible of the three, depending on what is available.

PAPI strives to provide a uniform environment across platforms. However, this is not always possible. Where hardware support for features, such as overflows and multiplexing is not supported, PAPI implements the features in software where possible. Also, processors do not support the same metrics, thus you can monitor different events depending on the processor in use.  Therefore, the interface remains constant, but how it is implemented can vary. Throughout this guide, implementation decisions will be documented where it can make a difference to the user, such as overhead costs, sampling, and etc.


 


 


On some of the systems that PAPI supports (see Appendix D), you can install PAPI right out of the box without any additional setup. Others require drivers or patches to be installed first.

The general installation steps are below, but first find your particular Operating System’s section of the /papi/INSTALL file for current information on any additional steps that may be necessary.

General Installation
 
1.                         Pick the appropriate Makefile.<arch> for your system in the papi source distribution, edit it (if necessary) and compile.
 
      % make -f Makefile.<arch>
 
2.                         Check for errors. Look for the libpapi.a and libpapi.so in the current directory. Optionally, run the test programs in the ‘ftests’ and ‘tests’ directories. 
      Not all tests will succeed on all platforms.
 
       % ./run_tests.sh
 
This will run the tests in quiet mode, which will print PASSED, FAILED, or SKIPPED. Tests are SKIPPED if the functionality being tested is not supported by that platform.
                                                                                                                                           
3.                         Create a PAPI binary distribution or install PAPI directly.
 
      To directly install PAPI from the build tree:
 
      % make -f Makefile.<arch> DESTDIR=<install-dir> install
 
      Please use an absolute pathname for <install-dir>, not a relative pathname.
 
      To create a binary kit, papi-<arch>.tgz:
 
      % make -f Makefile.<arch> dist

Text Box: C AND FORTRAN CALLING INTERFACES 

 

 

 


PAPI is written in C. The function calls stops:.5in'>The function calls in the Fortran interface are defined in the header file, fpapi.h and consist of the following form:

 

PAPIF_function_name(arg1, arg2, …, check)

 

As you can probably see, the C function calls have equivalent Fortran function calls (PAPI_<call> becomes PAPIF_<call>). Well, this is true for most function calls, except for the functions that return C pointers to structures, such as PAPI_get_opt and PAPI_get_executable_info, which are either not implemented in the Fortran interface, or implemented with different calling semantics. In the function calls of the Fortran interface, the return code of the corresponding C routine is returned in the argument, check.

For most architectures, the following relation holds between the pseudo-types listed and Fortran variable types:

Pseudo-type

Fortran type

Description

C_INT

INTEGER

Default Integer type

C_FLOAT

REAL

Default Real type

C_LONG_LONG

INTEGER*8

Extended size integer

C_STRING

CHARACTER*(PAPI_MAX_STR_LEN)

Fortran string

C_INT FUNCTION

EXTERNAL INTEGER FUNCTION

Fortran function returning integer result

Array arguments must be of sufficient size to hold the input/output from/to the subroutine for predictable behavior. The array length is indicated either by the accompanying argument or by internal PAPI definitions.

Subroutines accepting C_STRING as an argument are on most implementations capable of reading the character string length as provided by Fortran. In these implementations, the string is truncated or space padded as necessary. For other implementations, the length of the character array is assumed to be of sufficient size. No character string longer than PAPI_MAX_STR_LEN is returned by the PAPIF interface.

For more information on all of the function calls and their job descriptions, see Appendix B for the high-level functions and Appendix C for the low-level functions.


Text Box: EVENTS 


WHAT ARE EVENTS?

Events are occurrences of specific signals related to a processor’s function. Hardware performance counters exist as a small set of registers that count events, such as cache misses and floating point operations while the program executes on the processor. Monitoring these events facilitates correlation between the structure of source/object code and the efficiency of the mapping of that code to the underlying architecture. Each processor has a number of events that are native to and often to that architecture. PAPI provides a software abstraction of these architecture-dependent native events into a collection of preset events that are accessible through the PAPI interface.

NATIVE EVENTS

WHAT ARE NATIVE EVENTS?

 

Native events comprise the set of all events that are countable by the CPU. In many cases, these events will be available through a matching preset PAPI event. Even if no preset event is available native events can still be accessed directly. These events are intended to be used by people who are very familiar with the particular platform in use. PAPI provides access to native events on all supported platforms through the low-level interface. Native events use the same interface as used when setting up a preset event, but a CPU-specific bit pattern is used instead of the PAPI event definition.

 

Native encoding is usually:

((register code & 0xffffff) << 8 | (register number & 0xff))

 

Native encodings are platform dependent, so the above native encoding may or may not work with your platform. To determine the native encoding for your platform, see Appendix F or the README file for your platform in the PAPI source distribution. In addition, the native event lists for the various platforms can be found in the processor architecture manual.

 

Native events are specified as arguments to the low-level function, PAPI_add_event. In the following code example, a native event is added by using PAPI_add_event with the register code = 0x800000 and the register number = 0x01:

 

Text Box: #include <papi.h>
#include<stdio.h>

main()
{
int retval, EventSet = PAPI_NULL;
unsigned int native = 0x0;

/* Initialize the library */
retval = PAPI_library_init(PAPI_VER_CURRENT);

if  (retval != PAPI_VER_CURRENT) {
  printf(“PAPI library init error!\n”);
  exit(1);
}

if  (PAPI_create_eventset(&EventSet) != PAPI_OK) 
   handle_error(1);
  
/* Add the native event */
native = ((0x800000  & 0xffffff) << 8 | (0x01 & 0xff));

if (PAPI_add_event(&EventSet, native) != PAPI_OK)
   handle_error(1);
}
 

 


For more code examples, see tests/native.c in the papi source distribution.

PRESET EVENTS

WHAT ARE PRESET EVENTS?

 

Preset events, also known as predefined events, are a common set of events deemed relevant and useful for application performance tuning. These events are typically found in many CPUs that provide performance counters and give access to the memory hierarchy, cache coherence protocol events, cycle and instruction counts, functional unit, and pipeline status. Furthermore, preset events are mappings from symbolic names (PAPI preset name) to machine specific definitions (native countable events) for a particular hardware resource. For example, Total Cycles (in user mode) is PAPI_TOT_CYC. Also, PAPI supports presets that may be derived from the underlying hardware metrics. For example, Floating Point Instructions per Second is PAPI_FLOPS. A preset can be either directly available as a single counter, derived using a combination of counters, or unavailable on any particular platform.

 

The PAPI library names approximately 100 preset events, which are defined in the header file, papiStdEventDefs.h. For a given platform, a subset of these preset events can be counted though either a simple high-level programming interface or a more complete C or Fortran low-level interface. For a list and a job description of all the preset events, see Appendix A.

 

The exact semantics of an event counter are platform dependent. PAPI preset names are mapped onto available events in a way, so it can count as many similar types of events as possible on different platforms. Due to hardware implementation differences, it is not necessarily feasible to directly compare the counts of a particular PAPI event obtained on different hardware platforms. To determine which preset events are available on a specific platform, see Appendix E or run tests/avail.c in the papi source distribution.

PRESET QUERY

The following low-level functions can be called to query about the existence of a preset (in other words, if the hardware supports that certain preset), to query details about a PAPI event, or to acquire details about all PAPI events, respectively:

 

C:

PAPI_query_event(EventCode)

PAPI_query_event_verbose(EventCode, info)

PAPI_query_all_events_verbose()

 

Fortran:

PAPIF_query_event(EventCode, check)

PAPIF_query_event_verbose(EventCode, EventName, EventDescr, EventLabel, avail, EventNote, flags, check)

 

ARGUMENTS

EventCode -- a defined event, such as PAPI_TOT_INS.

EventName -- the event name, such as the preset name, PAPI_BR_CN.

EventDescr -- a descriptive string for the event of length less than PAPI_MAX_STR_LEN.

EventLabel -- a short descriptive label for the event of length less than 18 characters.

avail -- zero if the event CANNOT be counted.

EventNote -- additional text information about an event (if available).

flags -- provides additional information about an event, e.g., PAPI_DERIVED for an event derived from 2 or more other events.

Note that PAPI_query_all_events_verbose is not implemented in Fortran because it returns a C pointer to an array of C structures.

 

PAPI_query_event asks the PAPI library if the PAPI Preset event can be counted on this architecture. If the event CAN be counted, the function returns PAPI_OK. If the event CANNOT be counted, the function returns an error code. On some platforms, this function also can be used to check the syntax of a native event.

PAPI_query_event_verbose asks the PAPI library for a copy of an event descriptor. This descriptor can then be used to investigate the details about the event. In Fortran, the individual fields in the descriptor are returned as parameters.

 

PAPI_query_all_events_verbose asks the PAPI library to return a pointer to an array of event descriptors. The number of objects in the array is PAPI_MAX_PRESET_EVENTS and each object is a descriptor as returned by PAPI_query_event_verbose().