Multiple calls to PAPI_flops() problem

Open discussion of PAPI.

Multiple calls to PAPI_flops() problem

Postby SwissVince » Wed Oct 05, 2011 9:00 am

Hi all,

I face a little problem I don't understand. If I read the PAPI doc about the high level interface I get:

The values of rtime and ptime are derived from the cycle counter on the Pentium chip, and multiplied by a computed clock speed for the given processor as determined by measuring against a system real-time clock. You can stop the counters used by PAPI_flops with a call to PAPI_stop_counters. The next call to PAPI_flops will start over with fresh values for all returned parameters.


Good, now I go for a small test:

Code: Select all
#include <stdio.h>
#include <papi.h>

void do_some_work(){
   int i, maxiter;
   double p;
   maxiter=1000000000;

   for (i=0;i<maxiter;i++){
      p+=(double)i;
   }
}

int main()
{
   float real_time, proc_time,mflops;
   long long flpops;
   float ireal_time, iproc_time, imflops;
   long long iflpops;
   int retval;

   if((retval=PAPI_flops( &real_time, &proc_time, &flpops, &mflops))<PAPI_OK){   
      printf("retval: %d\n", retval);
      exit(1);
   }

   do_some_work();

   if((retval=PAPI_flops( &real_time, &proc_time, &flpops, &mflops))<PAPI_OK){   
      printf("retval: %d\n", retval);
      exit(1);
   }
   printf("Real_time: %f Proc_time: %f Total flpops: %lld MFLOPS: %f\n", real_time, proc_time,flpops,mflops);

   if((retval=PAPI_flops( &real_time, &proc_time, &flpops, &mflops))<PAPI_OK){   
      printf("retval: %d\n", retval);
      exit(1);
   }

   do_some_work();

   if((retval=PAPI_flops( &real_time, &proc_time, &flpops, &mflops))<PAPI_OK){   
      printf("retval: %d\n", retval);
      exit(1);
   }
   printf("Real_time: %f Proc_time: %f Total flpops: %lld MFLOPS: %f\n", real_time, proc_time,flpops,mflops);
   return 0;
}


compiled through

Code: Select all
icc test.c -o test -I/path/to/PAPI/include -L/path/to/PAPI/lib -lpapi


Execution result:

Code: Select all
Real_time: 0.000017 Proc_time: 0.000002 Total flpops: 12 MFLOPS: 4.907976
Real_time: 0.000065 Proc_time: 0.000016 Total flpops: 35 MFLOPS: 0.000000


on my intel workstation. My question is the following: it seems that the counters are set to zero the second time, why ? Do I wrote something wrong ?

Thanks in advance for those who will help.

Cheers
Vince
SwissVince
 
Posts: 2
Joined: Mon Jun 20, 2011 11:22 am

Re: Multiple calls to PAPI_flops() problem

Postby vweaver1 » Thu Oct 06, 2011 10:53 am

SwissVince wrote:
Code: Select all
Real_time: 0.000017 Proc_time: 0.000002 Total flpops: 12 MFLOPS: 4.907976
Real_time: 0.000065 Proc_time: 0.000016 Total flpops: 35 MFLOPS: 0.000000




Which intel processor are you running this on? Family/model from /proc/cpuinfo would be useful.

It looks like you are only getting 12 and 35 floating point ops, which means that icc is optimizing away your test code to nearly nothing.

Other Vince
vweaver1
 
Posts: 50
Joined: Wed Feb 17, 2010 4:02 pm

[SOLVED] Re: Multiple calls to PAPI_flops() problem

Postby SwissVince » Fri Oct 07, 2011 4:25 am

Dear other Vince,

Thanks a lot, you're fully right. It seems that gcc default optimization level is -O0 file icc's -O2. By compiling the same code without any optimization (or with gcc) it works fine

Code: Select all
Real_time: 3.772312 Proc_time: 6.251390 Total flpops: 2000032384 MFLOPS: 319.934052
Real_time: 7.540337 Proc_time: 12.502334 Total flpops: 4000072192 MFLOPS: 319.958923


The processor is a westmere (2x4 cores HT)

Code: Select all
vendor_id   : GenuineIntel
cpu family   : 6
model      : 44
model name   : Intel(R) Xeon(R) CPU           E5620  @ 2.40GHz
stepping   : 2


So, may I ask a newbie question: how the compiler changes a floating point operation into a non floating point one so that PAPI is not able to measure it ?

Regards
Vince
SwissVince
 
Posts: 2
Joined: Mon Jun 20, 2011 11:22 am

Re: Multiple calls to PAPI_flops() problem

Postby vweaver1 » Mon Oct 10, 2011 10:51 am

SwissVince wrote:
So, may I ask a newbie question: how the compiler changes a floating point operation into a non floating point one so that PAPI is not able to measure it ?



Your code just does a summation of 1 to 1000000000. The compiler can see this at compile time, and just replaces your loop with a function that returns the final value, as the result will always be the same.

To get code that the compiler can't optimize you'll have to either add a random number, or else do something like have the maxiter value be a paramater of the function (and not make the function static) that way the compiler doesn't know what the final value will be and thus can't optimize it.
vweaver1
 
Posts: 50
Joined: Wed Feb 17, 2010 4:02 pm


Return to General discussion

Who is online

Users browsing this forum: Bing [Bot] and 2 guests