PapiEx - Command line/library utility to measure hardware performance counters with PAPI


This version of PapiEx is no longer supported directly by it's authors, the University of Tennessee or former employees of SiCortex. An advanced and fully supported version is available from Samara Technology Group as part of their Performance Technology Platform. Samara Technology Group is a performance technology and services company staffed by the leading performance optimizations experts in the industry and the authors of this software. Support and enhancements for this Open Source version are also available via contract. Please contact them at
sales at samaratechnologygroup.com

PapiEx is a performance analysis tool designed to transparently and passively measure the hardware performance counters of an application using PAPI. It uses Monitor to to effortlessly intercept process/thread creation/destruction. It measures the entire run of an application. By default this includes all subprocesses. PapiEx's goal is to be a Linux substitute for the perfex command found in SGI's Speedshop. PapiEx is fairly simple to build, install and use. The most up to date documentation for monitor is always found in the man page.

Features

Download and Installation

Examples

The best documentation is in the form of examples. This example ASSUMES you have successfully built AND installed PapiEx AND that you're in the platform specific build directory. First we run emacs and count Total Cycles and Total Instructions redirecting the output from stderr(default) to a file. Next we run the pthreads test case and tell PapiEx to create files.
[mucci@localhost]$ papiex -e PAPI_TOT_CYC -e PAPI_TOT_INS emacs 2> sample.emacs
[mucci@localhost]$ tests/papiex -e PAPI_TOT_CYC -e PAPI_TOT_INS tests/pthreads
Here's the output: sample.emacs, sample.pthreads.1, sample.pthreads.2, sample.pthreads.3 and sample.memory.

PapiEx can automatically multiplex and count useful events available on your architecture. This is similar in intent to perfex -a and hpmstat -a

[mucci@localhost]$ papiex -a find /usr 2> sample.find
For statistical relevance, you should make sure that the run is reasonably long.

Multithreaded executables are handled seamlessly. PapiEx creates an output file the name cmd.papiex.host.pid.instance. The user can prefix the output file name with -pprefix flag. As an example:

[mucci@localhost]$ papiex -pmystats_ ./thrspecific 2>sample.thrspecific
The stderr output contains the aggregate statistics across all five threads of the executable. Individual per-thread statistics are placed in a directory
mystats_thrspecific.papiex.localhost.localdomain.4444/task_0
Here are the files: thread_0.summary, thread_1.summary, thread_2.summary, thread_3.summary, thread_4.summary.

Now let's consider a more involved example with a threaded-MPI run.

[mucci@localhost]$ mpirun -np 4 papiex -f /tmp bin/mpich2-mpi-thrspecific 2>sample.mpich2-mpi-thrspecific
The -f flag instructs PapiEx to create all output files under /tmp. The aggregate statistics across all tasks (which in turn are aggregated across all the threads for the task) are written to stderr, and can be seen here. The per-task and per-thread statistics are placed in:
/tmp/mpich2-mpi-thrspecific.papiex.localhost.localdomain.4613
Per-task summaries, which are averaged across all the threads of a task can be seen under this directory: task_0.summary, task_1.summary, task_2.summary, task_3.summary. The directory also contains per-task directories, which contain per-thread numbers as shown in the previous example.

Finally, let's consider how PapiEx makes using mpiP, a light-weight library for scalable profiling of MPI calls, easy to use. Normally, mpiP needs to be linked into the target executable. The PapiEx driver allows seamless deployment of mpiP on dynamically-linked executables. Let's see this with an example:

[mucci@localhost]$ mpirun -np 4 papiex -e PAPI_L1_DCM -M bin/mpich2-simple-mpi 2> sample.mpich2-simple-mpi
In the example we instruct PapiEx to measure L1 data cache misses, and also do MPI profiling with mpiP. The stderr output can be viewed in sample.mpich2-simple-mpi. The mpiP is stored in mpich2-simple-mpi.mpiP.localhost.localdomain.4862.1. The PAPI task statistics are stored in:
mpich2-simple-mpi.papiex.localhost.localdomain.4862

CVS Access

Currently, the best way to get PapiEx is to get it directly from CVS. You can access the CVS repository with your browser or use the anonymous CVS pserver. Just hit enter when asked for the password.
% setenv CVSROOT :pserver:anonymous@cvs.eecs.utk.edu:/cvs/homes/ospat
% cvs login
Password: 
% cvs co papiex

Testing

The distribution includes a 'make test' phase. The current release has been tested on:

Bug Reports

Bugs should be submitted to the PAPI Mailing List.

Authors

PapiEx was written by Philip J. Mucci of the Innovative Computing Laboratory and SiCortex Inc.. Major contributions and enhancements were made by Tushar Mohan, also of SiCortex Inc.

Copyright

This software is COMPLETELY OPEN SOURCE with an LGPL license. If you incorporate any portion of this software, I would appreciate an acknowledgement in the appropriate places. Should you find PapiEx useful, please considering making a contribution in the form of hardware, software or plain old cash.