counting L1 misses for simple example

Open discussion of PAPI.

counting L1 misses for simple example

Postby kodgireabhijeet » Sat Apr 03, 2010 4:14 am

Hello All,
I am working on intel xeon architecture with L1 cache .

Cache Information.

L1 Data Cache:
Total size: 32 KB
Line size: 64 B
Number of Lines: 512
Associativity: 8

I am running small program to calculate number of L1 misses.

And following is the code which accesses the data.


int *temp = (int*) malloc(1024*1024*sizeof(int));

if ((retval = PAPI_start(EventSet)) != PAPI_OK)
test_fail(__FILE__, __LINE__, "PAPI_start", retval);

for(i=0;i<8192;i++){
temp[i] = 10;
}

if ((retval = PAPI_read(EventSet,&values[0])) != PAPI_OK)
test_fail(__FILE__, __LINE__, "PAPI_read", retval);

I am expecting 512 L1 cache misses, its giving around 150 l1 misses that is also fluctuating for evry execution.
Am I missing anything? Any help would be appreciated.
kodgireabhijeet
 
Posts: 3
Joined: Sat Apr 03, 2010 3:49 am

Re: counting L1 misses for simple example

Postby vweaver1 » Mon Apr 05, 2010 5:55 pm

kodgireabhijeet wrote:I am working on intel xeon architecture with L1 cache .


what type of intel xeon is this exactly?

Also, what compiler and compiler options did you use to compile your example?
vweaver1
 
Posts: 50
Joined: Wed Feb 17, 2010 4:02 pm

Re: counting L1 misses for simple example

Postby kodgireabhijeet » Wed Apr 07, 2010 10:27 pm

Here is more information about my machine.

akodgire@timon ~/papi $ uname -a
Linux timon 2.6.32-gentoo-r5 #1 SMP Wed Feb 17 12:55:37 EST 2010 x86_64 Intel(R) Xeon(R) CPU X5365 @ 3.00GHz GenuineIntel GNU/Linux

I am using gcc compiler and -g option only. I am not using any optimizing option for compiler.

Let me know if anything else is needed.
Thanks,
Abhijeet
kodgireabhijeet
 
Posts: 3
Joined: Sat Apr 03, 2010 3:49 am

Re: counting L1 misses for simple example

Postby vweaver1 » Tue Apr 13, 2010 9:09 am

hello

I've been able to re-produce your problem on a core2 machine I have here. For some reason the L1 dcache misses are always off, by a large number (the variation run to run is normal). I suspect this has to do with the advanced prefetching into L1 that modern core2s do. I tried to verify this by turning off all prefetching, but unfortunately this did not change things. I'm still investigating to see if I can track down the source of the problem.
vweaver1
 
Posts: 50
Joined: Wed Feb 17, 2010 4:02 pm

Re: counting L1 misses for simple example

Postby kodgireabhijeet » Tue Apr 13, 2010 1:29 pm

hello,

Thanks for looking into it. I have considered the hardware prefetching mechanism but couldn't find how aggressively it prefetch the data.

And one more interesting observation, I have added the initialization code just before PAPI_Start call. I have initialized the aray elements and then accessed them while PAPI_Counters are active. When I initialized the array, that means all array elements are in cache for the next access. I was expecting less number of cache miss when I reaccess those elements, but surprisingly cache miss counter increased by almost 5 times. This result left me confused again, and could not apply hardware prefetchers theory here.


Code snippet:


int *temp = (int*) malloc(1024*1024*sizeof(int));

for(i=0;i<8192;i++){ // Added the initialization of array elements
temp[i] = 10;
}

if ((retval = PAPI_start(EventSet)) != PAPI_OK)
test_fail(__FILE__, __LINE__, "PAPI_start", retval);

for(i=0;i<8192;i++){
temp[i] = 10;
}

if ((retval = PAPI_read(EventSet,&values[0])) != PAPI_OK)
test_fail(__FILE__, __LINE__, "PAPI_read", retval);
kodgireabhijeet
 
Posts: 3
Joined: Sat Apr 03, 2010 3:49 am

Re: counting L1 misses for simple example

Postby Dmitry » Mon May 03, 2010 8:24 am

Do you set "the thread affinity" by the testing?
Dmitry
 
Posts: 13
Joined: Mon Dec 14, 2009 2:16 pm


Return to General discussion

Who is online

Users browsing this forum: No registered users and 2 guests

cron