Component PAPI, or PAPI-C is an attempt to make the PAPI performance monitoring programming interface available for more than just the hardware performance counters found on the cpu. Performance counters are finding their way into a number of other components of High Performance computing systems, such as network or memory controllers, power or temperature monitors or even specialized processing units that may find their way into future multicore processor implementations.
The primary technical challenge for PAPI is to sever the very tight coupling between the hardware independent layer of PAPI code and the hardware specific code necessary to interface with the counters, and to do this without sacrificing performance. Secondarily, once these two code layers have been functionally separated, the hardware independent, or Framework layer must be modified to simultaneously support multiple hardware dependent substrate layers, or Components.
These changes cannot be accomplished without some modification of the PAPI user interface. We have tried to keep these modifications minimal and transparent, and have been successful at preserving most backward compatibility for applications and tools that just want access to the cpu counters. We have introduced a small number of new APIs and functionality to support the new abstractions of multiple components. We have also modified the function of some APIs and data structures to support a multi-Component landscape. These changes have been tabluated at the bottom of this document, and are discussed below.
This release of PAPI-C is a technology pre-release, implemented and tested on a small number of platforms and components. The platforms include Intel Pentium III, Pentium 4, Core2Duo, Itanium (I and II) and AMD Opteron. Work is underway to port the other platforms currently supported by PAPI. Components (beside the cpu component) currently available include an ACPI component for monitoring temperature where available; a Myrinet MX component; and a 'toy' component that monitors network traffic as reported in the linux/unix sbin/ifconfig directory.
One of the key organizing data structures in PAPI is the EventSet. This serves as a repository for all the events and settings necessary to define a counting regime. EventSets are created, modified, added to, deleted from, and disposed of over the life of a PAPI counting session. In traditional PAPI, multiple EventSets can exist simultaneously, but only one can be active at any time. PAPI-C extends the concept of an EventSet by binding it to a specific numbered Component. This component index then signals which component the EventSet is paired with. Multiple EventSets can be defined and active simultaneously, but only one EventSet per Component can be enabled. We have adopted a late-binding model for associating an EventSet with a Component. No changes are needed in the API call for creating an EventSet, and the Set is bound to a Component when the first event is added. Any additional events must then belong to the same Component. Occasionally it is desirable to modify settings in an EventSet before an event is added. In this case, a new API, PAPI_assign_eventset_component(), has been introduced to make this binding explicit. For now, PAPI Preset events are only defined for the cpu component, which by convention is always component 0.
For now, PAPI Preset events are only defined for the cpu component, which by convention is always component 0. Since these event names and codes are available directly in papi.h, they will continue to work with no modifications. Event codes for other components are always mapped to native events available on that component and are bound to the component with a 4-bit component ID field embedded in the event code itself. These codes cannot be determined a priori, since they are an opaque id used only by PAPI. They must be obtained by a call to PAPI_event_name_to_code(), which will search all available native event tables and return a properly encoded value if the event exists. As described above, the first event added binds an EventSet to a Component; all following added events must belong to the same Component.
A number of changes were made to support various housekeeping chores associated with multiple Components. A new API, PAPI_num_components(), was added to provide the number of active components in the current library. Also, PAPI_get_component_info() replaces PAPI_get_substrate_info() and provides detailed information for a given component. As mentioned above, since the cpu component is always assumed to exist, it is always assigned as component 0. In addition, component 0 is always relied on to provide the high resolution timer functionality behind the following APIs: PAPI_get_real_cyc(), PAPI_get_virt_cyc(), PAPI_get_real_usec(), qnd PAPI_get_virt_usec(). One call, PAPI_num_hwctrs(), still functions as it did in traditional PAPI to provide the number of physical cpu counters. It has been augmented by the new PAPI_num_cmp_hwctrs(), to provide the number of counters for a specified component.
The bulk of the visible changes in PAPI-C have occured in the general area of setting and getting option values. Options can be either system-wide or component-specific. This didn't matter in traditional PAPI with only one component. Now it does. In order to preserve backward compatibility with code that only accesses the cpu component, the PAPI_get_opt() and PAPI_set_opt() calls behave as before, with an implicit component index of 0 for those options that are bound to a component. For those options that are component specific, PAPI_get_cmp_opt() and PAPI_set_cmp_opt() take an addition component index argument. Futher, two new convenience functions, PAPI_set_cmp_domain() and PAPI_set_cmp_granularity() have been added for component specific setting of these options. More subtly, two of the cases handled by PAPI_set_opt() now have additional information included in the passed data structures. Both PAPI_DEFDOM and PAPI_DEFGRN cases now require a component index to be provided in the passed data structure, since available domains are component dependent and may differ widely between cpu domains and, for example, network domains.
There are very few visible changes in the build environment. As before, cpu components
are automatically detected by configure and included in the build. As new
components are added, each is supported by a
--with-<cmp> = yes
option on the configure command line. Currently supported component options include:
--with-acpi = yes
--with-mx = yes
--with-net = yes
It is intended that in the future, where possible, component support will be autodetected by configure in a fashion similar to cpu architectures and automatically included in the make.
The make process currently compiles and links the sources for all requested
components into a single binary. This process is automatic and transparent once
the components are specified in the configure step. It is intended that
future releases will make each component independently and allow for dynamic
component loading at runtime.
Very few changes are needed to run existing PAPI-enabled applications under PAPI-C. The discussion below highlights the changes we found necessary in porting our test applications to the modified API: