Overflow & Statistical Profiling
From PAPIDocs
Jump to: navigation, search



An overflow happens when the number of occurrences of a particular hardware event exceeds a specified threshold. PAPI provides the ability to call user-defined handlers when an overflow occurs. This can be done in hardware, if the processor generates an interrupt signal when the counter reaches a specified value, or in software, by setting up a high-resolution interval timer and installing a timer interrupt handler. For software based overflow, PAPI compares the current counter value against the threshold every time the timer interrupt occurs. If the current value exceeds the threshold, then the user’s handler is called from within the signal context with some additional arguments. These arguments allow the user to determine which event overflowed, by how much it overflowed, and at what location in the source code the overflow occurred.

Using the same mechanism as for user programmable overflow, PAPI also guards against register precision overflow of counter values. Each counter can potentially be incremented multiple times in a single clock cycle. This fact combined with increasing clock speeds and the small dynamic range of some of the physical counters means that an overflow is likely to occur on platforms where 64-bit counters are not supported in hardware or by the operating system. In those cases, the PAPI implements 64-bit counters in software using the same mechanism that handles overflow dispatch.


An event set can begin registering overflows by calling the following low-level function:


PAPI_overflow(''EventSet, EventCode, threshold, flags, handler'')
EventSet  -- a reference to the event set to use 
EventCode -- the event to be used for overflow detection 
threshold -- the overflow threshold value to use 
flags     -- bit map that controls the overflow mode of operation. 
             The only currently valid setting is PAPI_OVERFLOW_FORCE_SW, 
             which overrides the default hardware overflow setting on a 
             platform that supports hardware overflow.
handler   -- the handler function to call upon overflow 

This function marks a specific EventCode' in an EventSet' to generate an overflow signal after every threshold events are counted. Mutiple events within an event set can be programmed to overflow by making successive calls to this function, but only a single overflow handler can be registered. To turn off overflow for a specific event, call PAPI_overflow with EventCode set to the desired event and threshold set to zero.

The handler function is a user-supplied callback routine that performs whatever special processing needed to handle the overflow interrupt, including sorting multiple overflowing events from each other. It must conform to the following prototype:


PAPI_overflow_handler(EventSet, address, overflow_vector, void *context)
EventSet        -- a reference to the event set in use 
address         -– the address of the program counter when the overflow occurred 
overflow_vector –- a 64-bit vector that specifies which counter(s) generated the overflow.
                   Bit 0 corresponds to counter 0. The handler should be able to deal with 
                   multiple overflow bits per call if more than one event may be set to overflow.
context         -- a platform dependent structure containing information about the state 
                   of the machine when the overflow occurred. This structure is provided for 
                   completeness, but can generally be ignored by most users.

In the following code example, PAPI_overflow is used to mark PAPI_TOT_INS in order to generate an overflow signal after every 100,000 counted events:

#include <papi.h>
#include <stdio.h>

#define THRESHOLD  100000

int total = 0;      /* total overflows */

void handler(int EventSet, void *address, long_long overflow_vector, void *context)
fprintf(stderr, "handler(%d) Overflow at %p! vector=0x%llx\n",
        EventSet, address, overflow_vector);

  	int retval, EventSet = PAPI_NULL;

  	/* Initialize the PAPI library */
  	retval = PAPI_library_init(PAPI_VER_CURRENT);
  	if (retval != PAPI_VER_CURRENT)

  	/* Create the EventSet */
  	if (PAPI_create_eventset(&EventSet) != PAPI_OK)

  	/* Add Total Instructions Executed to our EventSet */
  	if (PAPI_add_event(EventSet, PAPI_TOT_INS) != PAPI_OK)

  	/* Call handler every 100000 instructions */
  	retval = PAPI_overflow(EventSet, PAPI_TOT_INS, THRESHOLD, 0, handler);
  	if (retval != PAPI_OK)

  	/* Start counting */
  	if (PAPI_start(EventSet) != PAPI_OK)

On success, this function returns PAPI_OK and on error, a non-zero error code is returned.

For more code examples, see ctests/overflow.c, ctests/overflow_twoevents.c or ctests/overflow_pthreads.c in the papi source distribution.



Statistical Profiling involves periodically interrupting a running program and examining the program counter at the time of the interrupt. If this is done for a reasonable number of interrupting intervals, the resulting program counter distribution will be statistically representative of the execution profile of the program with respect to the interrupting event. Performance tools like UNIX prof sample the program address with respect to time and hash the value into a histogram. At program completion, the histogram is analyzed and associated with symbolic information contained in the executable. GNU prof in conjunction with the –p option of the GCC compiler performs exactly this analysis using the process time as the interrupting trigger. PAPI aims to generalize this functionality so that a histogram can be generated using any countable hardware event as the basis for the interrupt signal.


A PC histogram can be generated on any countable event by calling either of the following low-level functions:


PAPI_profil(''buf, bufsiz, offset, scale, EventSet, EventCode, threshold, flags'')
PAPI_sprofil(''prof, profcnt, EventSet, EventCode, threshold, flags'')



PAPI_profil(''buf, bufsiz, offset, scale, EventSet, EventCode, threshold, flags'', check)


*buf      -- pointer to profile buffer array. 
bufsiz    -- number of entries in *buf. 
offset    -- starting value of lowest memory address to profile. 
scale     -- scaling factor for bin values. 
EventSet  -- The PAPI EventSet to profile when it is started. 
EventCode -- code of the Event in the EventSet to profile. 
threshold -- threshold value for the Event triggers the handler. 
flags     -- bit pattern to control profiling behavior. 

The defined bit values for the flags variable are shown in the table below:

Defined bit Description
PAPI_PROFIL_POSIX Default type of profiling.
PAPI_PROFIL_RANDOM Drop a random 25% of the samples.
PAPI_PROFIL_WEIGHTED Weight the samples by their value.
PAPI_PROFIL_COMPRESS Ignore samples if hash buckets get big.
PAPI_PROFIL_BUCKET_16 Save samples in 16-bit hash buckets.
PAPI_PROFIL_BUCKET_32 Save samples in 32-bit hash buckets.
PAPI_PROFIL_BUCKET_64 Save samples in 64-bit hash buckets.
PAPI_PROFIL_FORCE_SW Force software overflow in profiling.
*prof -- pointer to PAPI_sprofil_t structure. 
profcnt -- number of buffers for hardware profiling (reserved).

PAPI_profil creates a histogram of overflow counts for a specified region of the application code by using its first four parameters to create the data structures needed by PAPI_sprofil and then calls PAPI_sprofil to do the work. PAPI_sprofil assumes a pre-initialized PAPI_sprofil_t structure and enables profiling for the EventSet based on its value. Note that the'EventSet'must be in the stopped state in order for either call to succeed.

More than one hardware event can be profiled at the same time by making multiple independent calls to these functions for the same EventSet before calling PAPI_start. This can be useful for the simultaneous generation of profiles of two or more related events, for example L1 cache misses and L2 cache misses. Profiling can be turned off for specific events by calling the function for that event with a threshold of zero.

On success, these functions return PAPI_OK and on error, a non-zero error code is returned.

For more code examples, see profile.c, profile_twoevents.c or sprofile.c in the ctests directory of the PAPI source distribution.

For a more extensive description of the parameters in the PAPI_profil call, see the PAPI_profil man page or its html counterpart at:http://icl.cs.utk.edu/projects/papi/files/html_man3/papi_profil.html

In the following code example, PAPI_profil is used to generate a PC histogram:

#include <papi.h>
#include <stdio.h>

int retval;
int EventSet = PAPI_NULL;
unsigned long start, end, length;
PAPI_exe_info_t *prginfo;
unsigned short *profbuf;

/* Initialize the PAPI library */
retval = PAPI_library_init(PAPI_VER_CURRENT);
if (retval != PAPI_VER_CURRENT & retval > 0) {
  fprintf(stderr,"PAPI library version mismatch!0);

if (retval < 0)

if ((prginfo = PAPI_get_executable_info()) == NULL)

start = (unsigned long)prginfo->text_start;
end = (unsigned long)prginfo->text_end;
length = end - start;

profbuf = (unsigned short *)malloc(length*sizeof(unsigned short));
if (profbuf == NULL)

memset(profbuf,0x00,length*sizeof(unsigned short));

if (PAPI_create_eventset(&EventSet) != PAPI_OK)

/* Add Total FP Instructions Executed to our EventSet */
if (PAPI_add_event(EventSet, PAPI_FP_INS) != PAPI_OK)

if (PAPI_profil(profbuf, length, start, 65536, EventSet, PAPI_FP_INS, 1000000,

/* Start counting */
if (PAPI_start(EventSet) != PAPI_OK)