PAPI_MEM_SCY and PAPI_RES_STL on AMD Opteron 6172

Open discussion of PAPI.

PAPI_MEM_SCY and PAPI_RES_STL on AMD Opteron 6172

Postby Aleksr9 » Tue Jul 31, 2012 10:20 am


I want to measure the time a program spends in the memory system but PAPI_MEM_SCY is not available for the AMD Opteron 6172 it seems.
Can it somehow be derived from PAPI_RES_STL which includes stalls for "any resource"? And what exactly does "any resource" include?

My specs:
papi version
AMD Opteron 6172
Red Hat Enterprise Linux Workstation release 6.2 (Santiago)

Any help is greatly appreciated, thanks in advance
Posts: 3
Joined: Tue Jul 31, 2012 10:05 am

Re: PAPI_MEM_SCY and PAPI_RES_STL on AMD Opteron 6172

Postby danterpstra » Wed Aug 01, 2012 9:49 am

It may be time to go to the AMD performance counter documentation. You can find it here:
The event information in section 3.14 of this document, along with the event names provided by papi_native_avail may give you the information you need to measure what you're looking for.

Re: PAPI_MEM_SCY and PAPI_RES_STL on AMD Opteron 6172

Postby Aleksr9 » Thu Aug 02, 2012 8:10 am

thanks for the answer, reading the manual cleared a lot of things up but i still have a couple of questions, if its not too inconvenient for you.

PAPI_TOT_CYC - based on AMD native event CPU_CLK_UNHALTED

The number of clocks that the CPU is not in a halted state (due to STPCLK or a HLT instruction). Note: this
event allows system idle time to be automatically factored out from IPC (or CPI) measurements, providing the
OS halts the CPU when going idle. If the OS goes into an idle loop rather than halting, such calculations are
influenced by the IPC of the idle loop.

I only thought that cpu was halted while waiting on main memory but could it be halted when waiting for cache as well? This would explain a lot if it is so. I've been looking around to see when the Opteron gets halted exactly but haven't had any luck yet.

PAPI_RES_STL - based on AMD native event DISPATCH_STALLS

The number of processor cycles where the decoder is stalled for any reason (has one or more instructions ready
but can't dispatch them due to resource limitations in execution). This is the combined effect of events D2h -
DAh, some of which may overlap; this event reflects the net stall cycles. The more common stall conditions
(events D5h, D6h, D7h, D8h, and to a lesser extent D2) may overlap considerably. The occurrence of these
stalls is highly dependent on the nature of the code being executed (instruction mix, memory reference patterns, etc.).

Then theres a list of events it counts, none of which seem to have much to do with the caches.

Could it be that these two events should be appended together and combined with the time spent in the memory system form the execution time of a program?

Thanks again


I've done some more testing by running programs and dividing the tot_cyc with the cpu clock rate and I've been pretty accurately hitting the execution time without appending the cache cycles. So CPU_CLK_UNHALTED counts cycles waiting for caches and im still baffled over how I've been getting more L1_DCA counts than total cycles.
Posts: 3
Joined: Tue Jul 31, 2012 10:05 am

Return to General discussion (read-only)

Who is online

Users browsing this forum: No registered users and 2 guests