Since long time we are using the papi-rapl component but most of our runs are wrong due to the registers overflow. We know the problem described in the Intel´s SandyBridge handbook regarding the MSR_PKG_ENERGY_STATUS register: "MSR PKG ENERGY STATUS: reports measured actual energy usage. This read-only MSR is updated every 1ms and has a wraparound time of about 60s when the power consumption is high; otherwise it may be longer."
For fixing or avoiding this problem, we read the counter at least once per minute, by setting a SIGALRM to a certain function and calling alarm(TIME) (of course, TIME<60)
- Code: Select all
#define REFRESH 1 // Each second
int in=0;
...
void refresh(int iSignal)
{
long long llValues;
PAPI_stop(eventSet, &llValues);
PAPI_start(eventSet);
printf ("Restart: %d - %ld\n", in++, llValues);
// Next refresh trigger
alarm(REFRESH);
}
int main (int argc, char *argv)
{
... <-- PAPI initializations, eventSets ... and signal forwarding
...
retval = PAPI_add_named_event(eventSet, "rapl:::PP0_ENERGY:PACKAGE0");
if (retval != PAPI_OK && retval != PAPI_EISRUN)
...
// Start counters
retval = PAPI_start(eventSet);
...
alarm(REFRESH);
function(bla bla) // could be something intensive or not...
...
retval = PAPI_stop(eventSet, &value);
...
printf ("Outside: %d - %ld\n", in, values);
...
}
Even in this case, calling the "refresh" function once per second, we are getting the overflow error:
- Code: Select all
Restart: 0 - 20280822754.
Restart: 1 - 19780670166.
Restart: 2 - 19713714600.
...
Restart: 64 - 19705352783. Time: 1358258014
Restart: 65 - 19698425293. Time: 1358258015
Warning! Over 60s since last read, potential overflow!
Outside: 66 - 5195053100
My question is mainly what´s wrong in this kind of solution... Are "start" and "stop" not flushing the register?
Thanks in advance for any tip