I have a similar problem with the original poster, but with the difference that I am using the PAPI_get_virt_nsec() or PAPI_get_virt_usec() functions, which are supposed to account only for the when a thread is in user mode (so execution times of external sources should not be accounted). The same application under the same workload executes once ~5sec and once ~7sec, while another application once ~7sec and another ~10sec. The elapsed time is actually ~5sec (or ~7sec for the second app).
The application is in Java and contacts an agent in native code to use the PAPI counters. Only 2 Java threads are shown to be executed by executing "top H" - and the one has almost constant CPU utilization of >98%. I am initializing the libraries and register the threads that will be using the meters. During the installation of PAPI I got:
checking for working CLOCK_THREAD_CPUTIME_ID POSIX 1b timer... yes
checking for thread virtual clock or cycle counter... clock_thread_cputime_id
which would suggest everything is fine. The same applications return standard results on another machine with a single CPU further suggesting that there's nothing wrong with the approach.
The CPU of the "problematic" machine is Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz and perfex -i returns the cpu_type to be 18 (Intel Core 2).
Could this behaviour have anything to do with the fact that I disabled the second core of the processor (echo 0 >> /sys/devices/system/cpu/cpu1/online)? Is there any other issue you can think?
Thank you for your help,