About power8 performance counters and memory bandwidth

Open discussion of PAPI.

About power8 performance counters and memory bandwidth

Postby pkumbhar » Tue Mar 17, 2015 1:50 pm

Hello All,

We are experimenting our application kernels on Power8 and looking at various hardware counters.
For set of application kernels, we could like to measure memory bandwidth.

On BG-Q, I am using PAPI to measure application bandwidth using:
Average memory bandwidth = (PEVT_L2_FETCH_LINE + PEVT_L2_STORE_LINE) * 128 bytes / elapsed_time.

On power8, we are measuring bandwidth as:

(PM_L3_PREF_ALL + PM_L3_CO_MEM + PM_DATA_ALL_FROM_LMEM + PM_DATA_ALL_FROM_DMEM + PM_DATA_ALL_FROM_RMEM + PM_DATA_ALL_FROM_LL4 + PM_DATA_ALL_FROM_DL4 + PM_DATA_ALL_FROM_RL4) * 128 bytes / elapsed_time

Is this correct? When we use above formula with STREAM benchmark, we see 15-20% higher bandwidth than reported by benchmark itself (this is with single thread).

How about multi-threaded applications on power8? Are those counters shared? Or every thread needs to measure it separately? (it’s easy on BG-Q as L2 counters are shared, I am not entirely sure about power8)

If someone could provide some pointers, it will be great help!

Thanks!
pkumbhar
 
Posts: 1
Joined: Tue Mar 17, 2015 1:42 pm

Re: About power8 performance counters and memory bandwidth

Postby josefaria » Sun Nov 15, 2015 10:57 pm

Hi, did you found the solution? I am having the same problem.
Thanks!
josefaria
 
Posts: 1
Joined: Sun Nov 15, 2015 10:51 pm


Return to General discussion (read-only)

Who is online

Users browsing this forum: No registered users and 1 guest