by jdmccalpin » Thu Feb 04, 2010 3:52 pm
The default configuration of the perfctr patch (at least through revision 2.6.31) disables access to any of the "shared" counters on Opteron systems (including Barcelona) when using the virtual interface that PAPI uses. This is not something under PAPI's control -- PAPI thinks that the L3 events are "available" because it knows how to create a native event code for them. It is only at run-time that the perfctr driver checks the hardware EventSelect value and aborts if the access is to a counter in the "shared" hardware (i.e., L3, crossbar, HyperTransport links, memory controllers, etc.).
On TACC's Ranger system, I modified the perfctr 2.6.31 patch to remove this test, so all Opteron preset and native events are available via PAPI. This makes it possible to get wrong answers (if you try to program the shared counters from more than one core), but it also makes it possible to get the right answers, so it seems like a reasonable tradeoff. The perfctr 2.6.39 and 2.6.40 versions have a different approach to this issue -- they bind any thread requesting a shared counter to core 0 on the chip that the thread is currently running on. I found that this did not play nicely with my own thread binding code (it hung the system), so I opted for a less restrictive approach.