Hello everyone,
I have develop the following MPI test program (Matrix Mult) in order to counter the L2 Cache misses of each processes:
.....
/*Declaring and initializing the event set with the presets*/
int Events[3] = {PAPI_TOT_CYC,PAPI_TOT_INS,PAPI_L2_DCM};
...
if ((num_hwcntrs = PAPI_num_counters()) < PAPI_OK)
{
printf("There are no counters available. \n");
exit(1);
}
printf("There are %d counters in this system\n",num_hwcntrs);
if ( (retval = PAPI_start_counters(Events, NUM_EVENTS)) != PAPI_OK)
ERROR_RETURN(retval);
comp_ini = get_cpu_time();
for (l= task.loops; l ; l--)
{
for (k=0; k<task.msize; k++)
{
for (i=0; i<task.rows; i++)
{
*(c + ELEMENT(task.msize,i,k)) = 0.0;
for (j=0; j<task.msize; j++)
*(c + ELEMENT(task.msize,i,k)) += *(a + ELEMENT(task.msize,i,j)) * *(b + ELEMENT(task.msize,j,k));
}
}
}
if ( (retval=PAPI_read_counters(values, NUM_EVENTS)) != PAPI_OK)
ERROR_RETURN(retval);
comp_fin = get_cpu_time();
comp_fin = comp_fin - comp_ini
I have run the program using 16 processes, (1 proces/core) using the netx cluster cluster topology:
1 node * 4 sockects
1 sockect has 2 intel xeon X7350 at 2.93GHz
Then, I have obtain the text results:
Computational time (var. comp_fin in the program): 124.105 sec.
Cycles: 363625567278
L2 misses: 1203956548
Characterizing the latency memory using ld_mem_rd I have obtained 124 ns, then,
If I use these simple equations:
Computational time = CPU time + ACCESS Memoriy time
(124.105 sec)
Memory Acces time = L2 misses * latency memory =========> 1203956548 × 124 nc ==> 149,29 sec
computional time < Memory Acces time
24.105 < 149,29
I don't obtain properly the cache misses with PAPI for MPI applications???
Could you help me!!
best regards
