BUG v3.7.1 Intel Core i7/Nehalem PAPI_VEC_DP == PAPI_VEC_SP

Open discussion of PAPI.

BUG v3.7.1 Intel Core i7/Nehalem PAPI_VEC_DP == PAPI_VEC_SP

Postby Dmitry » Mon Jan 18, 2010 10:54 am

Good Afternoon,

my proc is "Intel Core i7/Nehalem".

The measuring of PAPI_VEC_DP and PAPI_VEC_SP doesn't work properly in my tests.
The results of "PAPI_VEC_DP" and "PAPI_VEC_SP" are the same.

Thank's in advance
Dmitry
 
Posts: 13
Joined: Mon Dec 14, 2009 2:16 pm

Re: BUG v3.7.1 Intel Core i7/Nehalem PAPI_VEC_DP == PAPI_VEC_SP

Postby jagode00 » Mon Jan 18, 2010 6:58 pm

PAPI_VEC_DP and PAPI_VEC_SP are preset events. You can use the PAPI utility papi_avail with the -e option to see that both preset events are defined with the same native event on i7. (see an example below).

On Nehalem, there are no native events available to break the vector instructions (FP_COMP_OPS_EXE:SSE_FP_PACKED) down into single and double precision. This is different for scalar operations. Here we have two native events on Nehalem that allow us to break them down into single and double precision (FP_COMP_OPS_EXE:SSE_SINGLE_PRECISION and FP_COMP_OPS_EXE:SSE_DOUBLE_PRECISION).

You should definitively get different values when you use the preset events PAPI_DP_OPS and PAPI_SP_OPS.

Hope this helps!
heike.

Example:
$ ./papi_avail -e PAPI_VEC_SP
Available events and hardware information.
--------------------------------------------------------------------------------
PAPI Version : 4.0.0.0
Vendor string and code : GenuineIntel (1)
Model string and code : Intel Core i7 (21)
CPU Revision : 5.000000
CPUID Info : Family: 6 Model: 26 Stepping: 5
CPU Megahertz : 2926.000000
CPU Clock Megahertz : 2926
Hdw Threads per core : 1
Cores per Socket : 4
NUMA Nodes : 2
CPU's per Node : 4
Total CPU's : 8
Number Hardware Counters : 7
Max Multiplex Counters : 32
--------------------------------------------------------------------------------
The following correspond to fields in the PAPI_event_info_t structure.

Event name: PAPI_VEC_SP
Event Code: 0x80000069
Number of Native Events: 1
Short Description: |SP Vector/SIMD instr|
Long Description: |Single precision vector/SIMD instructions|
Developer's Notes: ||
Derived Type: |NOT_DERIVED|
Postfix Processing String: ||
Native Code[0]: 0x4000201b |FP_COMP_OPS_EXE:SSE_FP_PACKED|
Number of Register Values: 2
Register[ 0]: 0x0000000f |Event Selector|
Register[ 1]: 0x00001010 |Event Code|
Native Event Description: |Floating point computational micro-ops, masks:SSE FP packed Uops|
jagode00
 
Posts: 29
Joined: Tue Aug 25, 2009 2:12 pm

Re: BUG v3.7.1 Intel Core i7/Nehalem PAPI_VEC_DP == PAPI_VEC_SP

Postby Dmitry » Wed Jan 20, 2010 11:56 am

Thank you Heike!
I have one more question:
The event "PAPI_FP_OPS" counts the floating point operations. The counter returns most exactly half of really number. Do you know why is it so?

best regards
Dmitry
Dmitry
 
Posts: 13
Joined: Mon Dec 14, 2009 2:16 pm

Re: BUG v3.7.1 Intel Core i7/Nehalem PAPI_VEC_DP == PAPI_VEC_SP

Postby jagode00 » Wed Jan 20, 2010 12:17 pm

The preset event PAPI_FP_OPS counts scalar floating point operations, while the PAPI_DP_OPS (a fairly newly added event) counts both, scalar as well as scaled vector operations (for double precision). Note, the same present event exists for single precision (PAPI_SP_OPS).

I don't know the code you are investigating, but my guess is it's optimized to make use of vector instructions (either thru compiler options or the use of optimized library functions). In order to get the correct number of floating point operations, you might want to use the preset event PAPI_DP_OPS (for double precision) or PAPI_SP_OPS (for single precision) instead of PAPI_FP_OPS. Can you verify that this gives you the expected # of operations?

Thanks,
heike.
jagode00
 
Posts: 29
Joined: Tue Aug 25, 2009 2:12 pm

Re: BUG v3.7.1 Intel Core i7/Nehalem PAPI_VEC_DP == PAPI_VEC_SP

Postby Dmitry » Wed Jan 20, 2010 12:53 pm

Yes it does.
I use the Intel compiler v.11 with options: "..-O3 -xSSE4.2 -unroll-aggressive -pc64 -ansi-alias openmp'..". The code is something like BLAS, LAPACK....

The test calculates with double precision.
The preset event PAPI_DP_OPS returns correct number of operations.
Is it right the PAPI_DP_OPS by single precision is zero ?
Dmitry
 
Posts: 13
Joined: Mon Dec 14, 2009 2:16 pm

Re: BUG v3.7.1 Intel Core i7/Nehalem PAPI_VEC_DP == PAPI_VEC_SP

Postby jagode00 » Wed Jan 20, 2010 2:08 pm

If the code you're analyzing calls LAPACK functions, then those routines are generally highly optimized in order to make use of SIMD instructions (if supported). So, PAPI_DP_OPS would be the right preset event to use if your test calculates with double precision. And, yes, PAPI_DP_OPS should be 0 if you switch to single precision. You want to use PAPI_SP_OPS for single precision instead.

-heike.
jagode00
 
Posts: 29
Joined: Tue Aug 25, 2009 2:12 pm


Return to General discussion

Who is online

Users browsing this forum: No registered users and 2 guests

cron