Basically, every time there's a context switch, the counters are saved along with all the other state. This does imply a small additional overhead during context switch, but it's negligible and not attributed to your application. Vtune may now be able to start and stop counters, but I believe it still counts system-wide. Although that, too, may have changed with the introduction of counter support within the kernel. If you can guarantee pretty much exclusive access to a core, the counts shouldn't be terribly different.