Hi,
I'm trying to count some native events (L3_CACHE_MISSES and HYPERTRANSPORT_LINK0) for a program throughout its execution and having some issues.
I'm using Papi 3.7.0 on Linux with perfctr. The machine I'm experimenting on is a 4x4 AMD Opteron machine, and the process I want to measure has 16 threads, with one thread pinned to each core.
I just want the count events for the whole process, so at first I tried to set the granularity to process level (PAPI_GRN_PROC), but it seems that the Linux substrate only supports thread level granularity. So instead I tried starting a counter for each of the 16 threads in the program. However, after stating 4 counters, the rest of the thread's fail with PAPI_ESYS.
Any idea's why this might be happening and what I could do to get around it?