thanks for the answer, reading the manual cleared a lot of things up but i still have a couple of questions, if its not too inconvenient for you.
PAPI_TOT_CYC - based on AMD native event CPU_CLK_UNHALTED
The number of clocks that the CPU is not in a halted state (due to STPCLK or a HLT instruction). Note: this
event allows system idle time to be automatically factored out from IPC (or CPI) measurements, providing the
OS halts the CPU when going idle. If the OS goes into an idle loop rather than halting, such calculations are
influenced by the IPC of the idle loop.
I only thought that cpu was halted while waiting on main memory but could it be halted when waiting for cache as well? This would explain a lot if it is so. I've been looking around to see when the Opteron gets halted exactly but haven't had any luck yet.
PAPI_RES_STL - based on AMD native event DISPATCH_STALLS
The number of processor cycles where the decoder is stalled for any reason (has one or more instructions ready
but can't dispatch them due to resource limitations in execution). This is the combined effect of events D2h -
DAh, some of which may overlap; this event reflects the net stall cycles. The more common stall conditions
(events D5h, D6h, D7h, D8h, and to a lesser extent D2) may overlap considerably. The occurrence of these
stalls is highly dependent on the nature of the code being executed (instruction mix, memory reference patterns, etc.).
Then theres a list of events it counts, none of which seem to have much to do with the caches.
Could it be that these two events should be appended together and combined with the time spent in the memory system form the execution time of a program?
I've done some more testing by running programs and dividing the tot_cyc with the cpu clock rate and I've been pretty accurately hitting the execution time without appending the cache cycles. So CPU_CLK_UNHALTED counts cycles waiting for caches and im still baffled over how I've been getting more L1_DCA counts than total cycles.