Page 1 of 1

About power8 performance counters and memory bandwidth

PostPosted: Tue Mar 17, 2015 1:50 pm
by pkumbhar
Hello All,

We are experimenting our application kernels on Power8 and looking at various hardware counters.
For set of application kernels, we could like to measure memory bandwidth.

On BG-Q, I am using PAPI to measure application bandwidth using:
Average memory bandwidth = (PEVT_L2_FETCH_LINE + PEVT_L2_STORE_LINE) * 128 bytes / elapsed_time.

On power8, we are measuring bandwidth as:

(PM_L3_PREF_ALL + PM_L3_CO_MEM + PM_DATA_ALL_FROM_LMEM + PM_DATA_ALL_FROM_DMEM + PM_DATA_ALL_FROM_RMEM + PM_DATA_ALL_FROM_LL4 + PM_DATA_ALL_FROM_DL4 + PM_DATA_ALL_FROM_RL4) * 128 bytes / elapsed_time

Is this correct? When we use above formula with STREAM benchmark, we see 15-20% higher bandwidth than reported by benchmark itself (this is with single thread).

How about multi-threaded applications on power8? Are those counters shared? Or every thread needs to measure it separately? (it’s easy on BG-Q as L2 counters are shared, I am not entirely sure about power8)

If someone could provide some pointers, it will be great help!

Thanks!

Re: About power8 performance counters and memory bandwidth

PostPosted: Sun Nov 15, 2015 10:57 pm
by josefaria
Hi, did you found the solution? I am having the same problem.
Thanks!