Page 1 of 1

Papi_flop issue Power8

PostPosted: Tue Mar 31, 2015 8:42 am
by timocafe
Hello all,

I am experimenting PAPI and hardware counter on Power8. I think very is an issue PM_FLOP computation (used by PAPI_flop).
PM_FLOP is the sum of the following counter: PM_VSU{i}_{j}FLOP where i belongs to " 1 and 2 (2 VUS per core)", j is 1,2,4,8.
This 2 numbers matches nevertheless if we analyses the matrix-hl.c test we can have a pb.

At the end of the test there is error catcher:

Code: Select all
     if ( event[0] == PAPI_FP_INS ) {
                /*     Compare measured FLOPS to expected value */
                tmp =
                        2 * ( long long ) ( NROWS1 ) * ( long long ) ( NCOLS2 ) *
                        ( long long ) ( NCOLS1 );
        printf("%llu \n",tmp);
                if ( abs( ( int ) values[0] - ( int ) tmp ) > ( float ) tmp * 0.05 ) {
                        /*     Maybe we are counting FMAs? */
                        tmp = tmp / 2;
                        if ( abs( ( int ) values[0] - ( int ) tmp ) >
                                 ( float ) tmp * 0.05 ) {
                                printf( "\n" TAB1, "Expected operation count: ", 2 * tmp );
                                printf( TAB1, "Or possibly (using FMA):  ", tmp );
                                printf( TAB1, "Instead I got:            ", values[0] );
                                test_fail( __FILE__, __LINE__,
                                                   "Unexpected FLOP count (check vector operations)",
                                                   1 );

There is no error nevertheless If I remove the first branching and compile the test with -O3 for float and double I get

Code: Select all
Expected operation count:      11812500
Or possibly (using FMA):        5906250
Instead I got:                  3003761
matrix-hl.c - DOUBLE                             FAILED

Expected operation count:      11812500
Or possibly (using FMA):        5906250
Instead I got:                  1552507
matrix-hl.c - FLOAT                              FAILED

Presently I think the computation of PM_FLOP is wrong. For me every PM_VSU{i}_{j}FLOP is not the number of flop but the number of mnemonic that is completed. Consequently PM_VSU{i}_{j}FLOP should be correct by factor: x1 for M_VSU{i}_1FLOP, x2 for M_VSU{i}_2FLOP, x4 for M_VSU{i}_4FLOP and x8 for M_VSU{i}_8FLOP.

In fact your test work because you are compiling with -O0 consequently the ASM generated has only serial operations, measured by M_VSU{i}_1FLOP, where 1 scalar mnemonic is one flop. I did some test on dgemm and basic vector addition and FMA, that's confirme my correction.

If you have an access to Power8, could you verify ?



Re: Papi_flop issue Power8

PostPosted: Thu Apr 09, 2015 4:06 pm
by jagode00

Unfortunately the PAPI team doesn't have access to any Power machines right now.

However, in general I would caution against the use of PAPI_flops() (at this point). We need to evaluate if the high-level API is flexible enough to still be of value or not.

The truth is that the high-level API calls PAPI_flops() and flips() are not suitable and flexible enough if compiler optimization is used. PAPI_flops() uses the preset event PAPI_FP_OPS to count floating point operations. However, with today's instruction set extensions, there simply are not enough hardware counters available to count all necessary events at the same time in a single run. And often these events can't be combined to produce a single result that is correct in all cases. That's one of the reasons why we added more FP_OPS preset events such as:
PAPI_SP_OPS Floating point operations; optimized to count scaled single precision vector operations
PAPI_DP_OPS Floating point operations; optimized to count scaled double precision vector operations
PAPI_VEC_SP Single precision vector/SIMD instructions
PAPI_VEC_DP Double precision vector/SIMD instructions

In order to use these events, however, one has to use either PAPI_start_counters and PAPI_stop_counters in the high level API, or the low-level API.

When optimization is enabled, the compiler is packing multiple operands into e.g. SSE registers. PAPI_FP_INS and PAPI_FP_OPS cannot compensate for the fact that these short vector operations are actually executing multiple FP operations. That's why we defined separate SP_OPS and DP_OPS events (for systems we have access to). To use these events properly requires some knowledge about both the code being measured and the way these events are defined.

Long story short, since we don't have access to any Power machines at this point, we haven't had a chance to define these additional predefined events (SP_OPS, DP_OPS, etc).
If you have a chance to provide us with access to your Power8 system, we would be happy to help.


Re: Papi_flop issue Power8

PostPosted: Fri May 01, 2015 6:18 am
by timocafe
Thank you for the answer, unfortunately I am just a user of the P8, I do not have the power to provide you an access ... :/ But thank you very much for the answer. To continue my investigation I will apply my correction.

Re: Papi_flop issue Power8

PostPosted: Mon Nov 02, 2015 5:53 am
by timocafe

Did you finally test PAPI on the POWER8 ? I will be in SC15 (PMBS2015 workshop) if you want talk about the issues, no problem