I have a Nehalem CPU and I would like to count the FLOPs that my code executes. My code comprises a for loop with only double precision operations, here:

- Code: Select all
`#define INDEX 10`

unsigned int Events[2] = {PAPI_SP_OPS,PAPI_DP_OPS};

long long values[2];

/* Initialize the Matrix here */

if(PAPI_start_counters((int*)Events,2) != PAPI_OK)

printf("ERROR at init.");

/* Matrix-Matrix multiply */

for ( j = 0; j < INDEX; j++ )

for ( k = 0; k < INDEX; k++ )

mresult[k][j] = mresult[k][j]/2;

if(PAPI_stop_counters(values,2)!= PAPI_OK)

printf("ERROR at end.");

printf( "\n \n single precision: %lld double precision: %lld \n \n", values[0],values[1] );

When I compile it with the -O2 flag, I get the following,

single precision: 0 double precision: 100

which is what I expected.

When I compile with the -O3 flag, I get the following,

single precision: 150 double precision: 100

I know I should only be looking at the PAPI_DP_OPS value, but I am curious to know why exactly PAPI_SP_OPS is being incremented when I vectorize the for-loop due to the -O3 flag.