Hi,
I have a doubt about the multiplex PAPI mode.
I'm need to get the values of performance counters to Float Point Instructions (FLOPS) and memory traffic in all the levels in memory hierarchy (acesses in caches levels and Main Memory).
I have read the PAPI documentation and some performance guides (intel).
I know that the hardware has few performance counters...
I test my Event Set using papi tools (papi_event_chooser and papi_command_line) and in normal mode I can not get all counters, because this hardware limitation.
So, I'm trying to use the multiplex mode. In my first test I'm considering only the Last Level Cache (LLC).
I choose this Event Set:
{"PAPI_DP_OPS", "PAPI_L3_TCA", "PAPI_L3_TCR", "UNHALTED_CORE_CYCLES", "UNHALTED_REFERENCE_CYCLES"}
And the correspondent Event Set with native events (I believe):
{"FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE", "FP_COMP_OPS_EXE:SSE_FP_PACKED_DOUBLE", "SIMD_FP_256:PACKED_DOUBLE", "perf::PERF_COUNT_HW_CACHE_LL:READ", "perf::PERF_COUNT_HW_CACHE_LL:WRITE", "UNHALTED_CORE_CYCLES", "UNHALTED_REFERENCE_CYCLES"}
I'm using all the initializations library, multiplex... as in multiplex.* samples code.
But PAPI is not returning the last two events in my set, these values always are zero.
When I read about the multiplex in PAPI, I understood that this mode exists to allow get more events than available number of hardware counters.
PAPI switches during the execution mapping my N events in hardware counters (4 if hyper threading is activated).
It lost in precision, but I get measures of all events in my Event Set.
But, I don't know If I'm doing something wrong...
My processor is - Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz.
L1 cache: 32KB data, 32KB instruction per core
L2 cache: 256KB per core
L3 cache: 24MB accessible by all cores
QPI: 6.4GT/sec
2 processors: 24 cores
64 Gb RAM
Is possible get measures of all events at the same execution or I need to split my measuring process and execute more times switching between different event sets?
Thanks for help!
Best regards,
Rogério