Page 1 of 1

what's the precision of the PAPI event counters?

PostPosted: Thu Jan 28, 2010 3:09 am
by ichaos
I want to get the L1 cache load miss of some code. But the code is too short(written in assembly) to get a stable and precise values.
And i calculate the l1 cache load miss when nothing to do between the PAPI_start and PAPI_read, the result is 1~12. So how can get a precise value of PAPI event?

My code is like this:

in a for loop:
/* Start counting events in the Event Set */
if (PAPI_reset(EventSet) != PAPI_OK) {
printf("PAPI reset failed. \n");
return -1;//handle_error(1);
}

//some code
.....

/* Read the counting events in the Event Set */
if (PAPI_read(EventSet, values) != PAPI_OK) {
printf("PAPI read failed. \n");
return -1;
}

get the values and store.

Re: what's the pricesion of the PAPI event counters?

PostPosted: Thu Jan 28, 2010 1:31 pm
by Dan Terpstra
You've answered your own question: "the code is too short to get a stable and precise value" Too many other things could be going on with the cache to guarantee accuracy in values this small. This is not a problem with PAPI, but with the phenomenon you are trying to measure. Try to do more work, or pin the process to the cpu, kill any daemons or interrupts, and pre-warm the cache to a known state before measuring. Any of these could perturb the cache behavior at this level.

Re: what's the pricesion of the PAPI event counters?

PostPosted: Fri Jan 29, 2010 2:25 am
by ichaos
[/quote]
Dan Terpstra wrote:You've answered your own question: "the code is too short to get a stable and precise value" Too many other things could be going on with the cache to guarantee accuracy in values this small. This is not a problem with PAPI, but with the phenomenon you are trying to measure. Try to do more work, or pin the process to the cpu, kill any daemons or interrupts, and pre-warm the cache to a known state before measuring. Any of these could perturb the cache behavior at this level.


Yeah, i know the noise in the system will make the precise measure hard. However, what i want to know is the PAPI_read call(from user to kernel) will cost how many cycles and make how many cache miss? It's stable or unstable?
Thanks.

Re: what's the precision of the PAPI event counters?

PostPosted: Fri Jan 29, 2010 8:59 am
by Dan Terpstra
Since the system is inherently non-deterministic (you can't guarantee the state of the cache when you make PAPI calls) the best you can do is determine stability within error bars. It looks like your work says those error bars are between 1 - 12 cache misses. That will also obviously affect the cycles, too.

Re: what's the precision of the PAPI event counters?

PostPosted: Wed Mar 31, 2010 6:02 pm
by gnn
I would be interested to know, though, what values people generally get on modern processors? I note that on Nehalem trying to profile something that's less than 1000 loops of a call to log (which is my test load) does not give accurate answers.

Trace: 4321 instructions, 1 reps
Trace: 1739 instructions, 10 reps
Trace: 9137 instructions, 100 reps
Trace: 83864 instructions, 1000 reps
Trace: 831747 instructions, 10000 reps

The code being measured is this:

void dummy_load(uint64_t reps)
{
double dummySum = 0;

for (uint64_t i = 0; i < reps; ++i) {
dummySum += log(i+1); // A bit slower than sqrt().
}
}