quick question on the above mentioned counters. When measuring both events for a memory intensive piece of code (no disk I/O), sometimes PAPI_TOT_CYC is larger than PAPI_RES_STL.
From the PAPI documentation I assumed PAPI_TOT_CYC counts all cycles and therefore, PAPI_RES_STL should be included in them.
Platform is Intel E5450; according to `papi_decode`, PAPI_TOT_CYC is based on native UNHALTED_CORE_CYCLES, and PAPI_RES_STL is based on RESOURCE_STALL:ANY.
I found several sources on google where people measured the % of resource stalls in cycles by RESOURCE_STALL:ANY/UNHALTED_CORE_CYCLES, respectively PAPI_RES_STL/PAPI_TOT_CYC, indicating that RESOURCE_STALL:ANY < UNHALTED_CORE_CYCLES, respectively PAPI_RES_STL < PAPI_TOT_CYC, should hold true.
Any explanation on this behavior would be highly appreciated.