PAPITopics:RAPL Access
From PAPIDocs
Jump to: navigation, search

Accessing the Intel RAPL Registers

With the Sandy Bridge architecture, Intel introduced the "Running Average Power Limit", or RAPL feature. Although primarily intended to control or limit power usage on chip, this feature also has power and energy measurement capabilities that make it interesting for PAPI.

It appears that work is underway to introduce a kernel driver for Linux that would allow for easy access to this information [1]. Meanwhile, the PAPI RAPL component relies on reading the RAPL MSR registers directly.

RAPL uses the MSR kernel module to read model specific registers (MSRs) from user space. To enable the msr module interface the admin needs to 'chmod 666 /dev/cpu/*/msr'. For kernels older than 3.7, this is all that is required to use the PAPI RAPL component.

Historically, the Linux MSR driver only relied upon file system checks. This means that anything as root with any capability set could read and write to MSRs.

Changes in the mainline Linux kernel since around 3.7 now require an executable to have capability CAP_SYS_RAWIO to open the MSR device file [2]. This change impacts user programs that use PAPI APIs that rely on the MSR device driver. Besides loading the MSR kernel module and setting the appropriate file permissions on the msr device file, one must grant the CAP_SYS_RAWIO capability to any user executable that needs access to the MSR driver, using the command below:

setcap cap_sys_rawio=ep <user_executable>

Note that one needs superuser privileges to grant the RAWIO capability to an executable, and that the executable cannot be located on a shared network file system partition.

The dynamic linker on most operating systems will remove variables that control dynamic linking from the environment of executables with extended rights, such as setuid executables or executables with raised capabilities. One such variable is LD_LIBRARY_PATH. Therefore, executables that have the RAWIO capability can only load shared libraries from default system directories.

One can work around this restriction by either installing the shared libraries in system directories, linking statically against those libraries, or using the -rpath linker option to specify the full path to the shared libraries during the linking step.



RAPL Events

There are two basic types of events that can be reported from RAPL:

  • Dynamic energy readings from various components of the chip, such as the package (PACKAGE_ENERGY), the DRAM or GPU (DRAM_ENERGY), or the CPUs (PP0_ENERGY)
  • Static fixed values for things like thermal specifications, maximum and minimum power caps, and time windows over which power is monitored.

As of PAPI 5.3, these events are reported either as scaled values with units of joules, watts or seconds; or with slightly different event names they can be reported as raw binary values, suitable for doing arithmetic.

You can learn more about these values from the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3B: Part 2. You can also use papi_naitve_avail -i rapl to examine exact event names available on your system.

RAPL Tests

The rapl/tests directory contains two source files that produce four different test programs. rapl_overflow.c generates a program to exercise the overflow capability of rapl energy counters; rapl_basic generates three similar test cases to exercise basic energy measurement for a sleep period (rapl_basic), energy measurement while doing work -- in this case a naive gemm (rapl_busy), and a variation that estimates how long it will take before overflowing the rapl energy counters given a naive gemm workload (rapl_wraparound).

RAPL energy counters are free-counting 32-bit register values. The PAPI rapl component is designed to handle the full 32-bit dynamic range of the register, but it can't accommodate counting periods that would fill more than 32-bits. To estimate how long this would be, rapl_wraparound does a small gemm calculation to determine how many counts are generated and then extrapolates to the full 32 bit range. Obviously, different workloads will consume energy at different rates. Te text on your own workload, modify the rapl_basic.c source code, replacing the run_test function with your own kernel and recompile. You can also run rapl_wraparound with a -w option (for wraparound) to execute for the predicted amount of time and observe how much of the 32-bit dynamic range was actually used.