PAPI-RAPL Overflow

PAPI-RAPL Overflow

Postby cnavarrete » Tue Jan 15, 2013 11:17 am

Dear list members,
Since long time we are using the papi-rapl component but most of our runs are wrong due to the registers overflow. We know the problem described in the Intel´s SandyBridge handbook regarding the MSR_PKG_ENERGY_STATUS register: "MSR PKG ENERGY STATUS: reports measured actual energy usage. This read-only MSR is updated every 1ms and has a wraparound time of about 60s when the power consumption is high; otherwise it may be longer."
For fixing or avoiding this problem, we read the counter at least once per minute, by setting a SIGALRM to a certain function and calling alarm(TIME) (of course, TIME<60)
Code: Select all
#define REFRESH 1   // Each second
int in=0;
...
void refresh(int iSignal)
{
        long long llValues;

        PAPI_stop(eventSet, &llValues);
        PAPI_start(eventSet);
        printf ("Restart: %d - %ld\n", in++, llValues);

        // Next refresh trigger
        alarm(REFRESH);
}
int main (int argc, char *argv)
{
... <-- PAPI initializations, eventSets ... and signal forwarding
...
    retval = PAPI_add_named_event(eventSet, "rapl:::PP0_ENERGY:PACKAGE0");
    if (retval != PAPI_OK && retval != PAPI_EISRUN)
...
    // Start counters
    retval = PAPI_start(eventSet);
...
   alarm(REFRESH);
   function(bla bla) // could be something intensive or not...
...
  retval = PAPI_stop(eventSet, &value);
...
  printf ("Outside: %d - %ld\n", in, values);
...
}


Even in this case, calling the "refresh" function once per second, we are getting the overflow error:

Code: Select all
Restart: 0 - 20280822754.
Restart: 1 - 19780670166.
Restart: 2 - 19713714600.
...
Restart: 64 - 19705352783. Time: 1358258014
Restart: 65 - 19698425293. Time: 1358258015
Warning!  Over 60s since last read, potential overflow!
Outside: 66 - 5195053100


My question is mainly what´s wrong in this kind of solution... Are "start" and "stop" not flushing the register?

Thanks in advance for any tip
cnavarrete
 
Posts: 6
Joined: Tue Jan 15, 2013 10:35 am

Re: PAPI-RAPL Overflow

Postby jcebrian » Thu Aug 29, 2013 9:51 am

A possible patch is to replace this code in both read and stop functions of the linux-rapl.c component file. In addition the user should read values at intervals (setting an alarm for example) of less than 60sec to prevent overflow problems.

papi-5.2.0/src/components/rapl/linux-rapl.c

Old code:

Code: Select all
     if (control->count[i] < 0 ) {
       printf("Error! overflow!\n");
     }


New code:

Code: Select all
 #define WRAP_AROUND_VALUE                       0xffffffff
...
     if (control->count[i] < 0 ) {
       //   printf("Error! overflow!\n");
       // JMCG Control overflow, undo unit calc.
       control->count[i] = (long long)((double)(WRAP_AROUND_VALUE - (long long)(((double)(context->start_count[i])/1e9)*energy_divisor))/energy_divisor*1e9) + temp;
     }


Jm.
jcebrian
 
Posts: 1
Joined: Thu Aug 29, 2013 9:40 am


Return to Component PAPI (PAPI-C)

Who is online

Users browsing this forum: No registered users and 2 guests