5.3 - Xeon Phi papi.h Offload problem

Open discussion of PAPI.

5.3 - Xeon Phi papi.h Offload problem

Postby xphi512 » Fri Dec 13, 2013 4:50 pm

Hey guys,


I have a Problem on the Xeon Phi with loading the papi.h header file. I cross compiled the papi library on the host as described in the new papi version 5.3 and tested ./papi_avail on the MIC.

Code: Select all
PAPI_L1_DCM  0x80000000  Yes   No   Level 1 data cache misses
PAPI_L1_ICM  0x80000001  Yes   No   Level 1 instruction cache misses
PAPI_TLB_DM  0x80000014  Yes   No   Data translation lookaside buffer misses
PAPI_TLB_IM  0x80000015  Yes   No   Instruction translation lookaside buffer misses
PAPI_L2_LDM  0x80000019  Yes   No   Level 2 load misses
PAPI_BR_MSP  0x8000002e  Yes   No   Conditional branch instructions mispredicted
PAPI_TOT_INS 0x80000032  Yes   No   Instructions completed
PAPI_LD_INS  0x80000035  Yes   No   Load instructions
PAPI_SR_INS  0x80000036  Yes   No   Store instructions
PAPI_BR_INS  0x80000037  Yes   No   Branch instructions
PAPI_VEC_INS 0x80000038  Yes   No   Vector/SIMD instructions (could include integer)
PAPI_TOT_CYC 0x8000003b  Yes   No   Total cycles
PAPI_L1_DCA  0x80000040  Yes   No   Level 1 data cache accesses
PAPI_L1_ICA  0x8000004c  Yes   No   Level 1 instruction cache accesses



so it works... Then I tried to compile my offload code with papi.h included as follows:

Code: Select all
#pragma offload_attribute (push,target(mic))
#include "papi.h"
#pragma offload_attribute (pop)


I compiled the code with:

Code: Select all
icpc -openmp -O3 -offload-option,mic,ld,/home/$(ACCOUNT_NAME)/lib/papi/lib/libpapi.a foo.cpp -o foo.exe


and got the following error:

Code: Select all
foo.cpp(11): catastrophic error: cannot open source file "papi.h"
  #include "papi.h"
                   ^

compilation aborted for foo.cpp (code 4)


so my questions is - how can I include the header file? It doesn't work for me as described in PAPI 5.3.

Some informations about the MIC I use:

Code: Select all
--------------------------------------------------------------------------------
PAPI Version             : 5.3.0.0
Vendor string and code   : GenuineIntel (1)
Model string and code    : 0b/01 (1)
CPU Revision             : 3.000000
CPUID Info               : Family: 11  Model: 1  Stepping: 3
CPU Max Megahertz        : 1052
CPU Min Megahertz        : 842
Hdw Threads per core     : 4
Cores per Socket         : 60
Sockets                  : 1
CPUs per Node            : 240
Total CPUs               : 240
Running in a VM          : no
Number Hardware Counters : 2
Max Multiplex Counters   : 64
--------------------------------------------------------------------------------


Hope you can help me :)


Here is my full code:
Code: Select all
#include <iostream>
#include <iomanip>
#include <cmath>
#include <cstdlib>
#include <cstddef>
#include <omp.h>
#include <immintrin.h>
#include <zmmintrin.h>

#pragma offload_attribute (push,target(mic))
#include "papi.h"
#pragma offload_attribute (pop)


void init( float* x, float* y, float* z, std::size_t n );


int main(int argc, char** argv) {

   float sum = 0.0f;
   const float a = 1.5f;
   std::size_t n = 4096 * 125;

   __attribute__((target(mic))) float *x;
   __attribute__((target(mic))) float *y;
   __attribute__((target(mic))) float *z;

   posix_memalign((void**) &x, 4096, n * sizeof(float) );
   posix_memalign((void**) &y, 4096, n * sizeof(float) );
   posix_memalign((void**) &z, 4096, n * sizeof(float) );



   init( x, y, z, n );



   #pragma offload target(mic) \
         in( n, a ) \
         in( x:length(n) align(4096) alloc_if(1) free_if(1) ) \
         in( y:length(n) align(4096) alloc_if(1) free_if(1) ) \
         inout( z:length(n) align(4096) alloc_if(1) free_if(1) )
   {
      omp_set_num_threads(240);
      #pragma omp parallel
      {
         __m512 x_, y_, z_, a_;

         a_ = _mm512_set1_ps( a );

         #pragma omp for schedule(static)
         for ( std::size_t j = 0; j < n; j += 16 ){

            x_ = _mm512_load_ps( x + j );
            y_ = _mm512_load_ps( y + j );

            y_ = _mm512_fmadd_ps ( a_, x_, y_);

            _mm512_store_ps( z + j, y_ );
         }
      }
   }



   //prevent compiler optimizations for unused variables
   for ( std::size_t j = 0; j < n; j++ ) sum += z[j];
   std::cout << std::endl << sum << std::endl;

   free(x);
   free(y);
   free(z);

   return 0;
}

inline void init( float* x, float* y, float* z, std::size_t n ) {

   for (std::size_t i = 0; i < n; i++){

      x[i] = 1.2f * (float)i;
      y[i] = 4.3f * (float)i;
      z[i] = 0.0f;
   }
}
xphi512
 
Posts: 3
Joined: Fri Dec 13, 2013 4:32 pm

Re: 5.3 - Xeon Phi papi.h Offload problem

Postby James Ralph » Thu Dec 19, 2013 4:22 pm

The missing papi.h file error pops up because the compiler only has a few directories in which it searches for files.
To fix this you need to tell it to search another directory, so add -I/home/$(ACCOUNT_NAME)/include to your compiler invocation.

The 5.3 documentation for instrumenting offload code is a little detail-lite,
to use PAPI in offload code you need to have a native and cross-compiled version of the library handy.
An example is probably best; lets build the hybrid_native_avail source that comes in the utils directory.

[edit] This demo makes use of some common utility code specific to the PAPI tests and utilities,
in your own program you can safely ignore anything below with testlib in it.

We build/install host and a mic-native version of PAPI in /home/ralph/{host | mic} respectively.
(for the host first)
./configure --prefix=/home/ralph/host
make
make install-all
make clean

(now the cross-compiled version for the MIC)
./configure --with-mic --prefix=/home/ralph/mic
make
make install-all

(cd utils)
icc -openmp -o hybrid_native_avail hybrid_native_avail.c -I/home/ralph/host/share/papi/testlib -I/home/ralph/host/include /home/ralph/host/lib/libpapi.a /home/ralph/host/share/papi/testlib/libtestlib.a -offload-option,mic,ld,"/home/ralph/mic/lib/libpapi.a /home/ralph/mic/share/papi/testlib/libtestlib.a"

Hope this gets you started,
James
James Ralph
 
Posts: 20
Joined: Tue Aug 25, 2009 2:43 pm

Re: 5.3 - Xeon Phi papi.h Offload problem

Postby sameer_asal » Tue May 13, 2014 1:19 pm

I have a bit of a similar use case.

I am trying to measure the energy consumed by an offloaded code to mic, I also need to have the number of cache misses and some vector instruction count
executed on mic cores.

My question is, what counters should I exactly register to when I create my event set, when I did this for RAPL I could enumerate the ones in the "RAPL" component but with the mic power I am a little bit confused. I would really appreciate any pointers here.


Thank you,
sameer_asal
 
Posts: 6
Joined: Mon May 12, 2014 11:10 pm

Re: 5.3 - Xeon Phi papi.h Offload problem

Postby sameer_asal » Tue May 13, 2014 1:20 pm

I have a bit of a similar use case.

I am trying to measure the energy consumed by an offloaded code to mic, I also need to have the number of cache misses and some vector instruction count
executed on mic cores.

My question is, what counters should I exactly register to when I create my event set, when I did this for RAPL I could enumerate the ones in the "RAPL" component but with the mic power I am a little bit confused. I would really appreciate any pointers here.


Thank you,
sameer_asal
 
Posts: 6
Joined: Mon May 12, 2014 11:10 pm

Re: 5.3 - Xeon Phi papi.h Offload problem

Postby sameer_asal » Tue May 13, 2014 1:26 pm

I have a bit of a similar use case.

I am trying to measure the energy consumed by an offloaded code to mic, I also need to have the number of cache misses and some vector instruction count executed on mic cores.

My question is, what counters should I exactly register to when I create my event set, when I did this for RAPL I could enumerate the ones in the "RAPL" component but with the mic power I am a little bit confused. I would really appreciate any pointers here.


Thank you,
sameer_asal
 
Posts: 6
Joined: Mon May 12, 2014 11:10 pm


Return to General discussion

Who is online

Users browsing this forum: Yahoo [Bot] and 1 guest

cron