papiex - transparently measure hardware performance events of an application with PAPI
papiex [-dhisVqnwlUKISarx] [-f output-dir] [-p prefix] [-o output-file] [-e papi-event] [-m[<interval>]] [-L papi-event] [--no-summary] [--no-scientific] [--no-ld-path] [--no-gather] [--] command [args ...]
papiex -- command args ...
papiex is a PAPI-based program for measuring hardware performance events of an application using the command-line. It supports both PAPI preset events and native events. It supports multiple threads of execution as well, including pthreads and OpenMP threads. For MPI programs, papiex can gather statistics across tasks. papiex also measures the total time spent in I/O and MPI calls.
The default settings are equivalent to typing:
papiex -u -e PAPI_TOT_CYC -e PAPI_FP_INS
If your processor doesnt support counting of floating point operations (like the UltraSparc II and the original AMD Athlon) then PAPI_FP_INS or PAPI_TOT_INS is used instead (in order of availability).
papiex honors the environment variable PAPIEX_DEFAULT_ARGS. The contents of this variable are appended to the command-line arguments every time papiex is invoked. It is suggested that platform-wide arguments be set via this variable.
-- This is the standard separator telling papiex to terminate its option processing and pass the rest of the command line to the underlying shell. Use this if your application takes command line arguments.
Example: papiex -- ls -a
-a Monitor useful events available on the architecture automatically. This implicitly enables multiplexing (see the -m flag) and memory monitoring (see the -x flag).
-d Enable debugging output. Repeat this option for more verbosity.
-e <event> Monitor the event as named. The event is a symbol as listed in the output from either the -l or -L flag. You may specify more than one event. If you specify more than the number of physical registers as listed with the -i flag, you must enable multiplexing with -m otherwise an error will be reported.
-f <output directory> All output of papiex is created under the specified output directory. If the directory does not exist, it is created. By default, all output is placed under the working directory.
-h Print the usage information.
-i Print information about the host processor.
-I Measure hardware counters in transient mode. This mode may not be supported on your processor. Some CPUs execute interrupt/TLB miss handlers in an entirely different privelege level. If your processor does not support this level, you will get an error when papiex goes to set up the counters. See, also -K, -U and -S flags for other modes.
-K Measure hardware counters in kernel mode. See, also -I, -U and -S flags for other modes.
-l Print a list of the available PAPI presets and native events.
-L event Print a full description of event.
-m[<interval>] Enable counter multiplexing to measure more events than the number of physical counters available. The number of counters can be discovered with the -i flag. The interval is specified in Hz; it is optional and defaults to 10 (Hz).
-n Do NOT create ANY output files. By default, in addition to writing to the terminal, papiex creates files (and directories) containing its output. See, also -w.
-o <output-file> By default papiex dumps it output in a file/directory named <cmd>.papiex.<host>.<pid>.<instance>. The -o flag, instead causes the output to be sent to the user-supplied path. For multithreaded and MPI runs, its behavior can seem confusing, and you probably should use -f instead.
-p <prefix> By default papiex dumps it output in a file/directory named <cmd>.papiex.<host>.<pid>.<instance>. The -p flag causes <prefix> to be prepended to the output name. This is useful for MPI and multithreaded runs. For readibility, it is a good idea to have a separator, such as . (dot), at the end of your prefix.
-q Print information in a less verbose format. This is just the counter value followed by the counter name. The only additional information printed is the timing information and any thread identifiers. It is printed right justified with a width of 16 places. This option is currently not compatible with -r.
-r Report resources used by the program, as reported by getrusage(). Most of the time this doesnt work on Linux. This option is current not compatible with -q.
-s This option simply dumps the environment variable/value pairs to stdout and then exits.
-S Measure hardware counters in supervisor mode. See, also -I, -U and -K flags for other modes.
-U Measure hardware counters in user mode. This is the default counting mode. See, also -I, -K and -S flags for other modes.
-w Do NOT send output to the console. By default, in addition to writing files, papiex emits output to the terminal. See also, -n.
-x Report memory information for the process. Not all statistics will be available on all Linux kernel versions. Currently reported are peak virtual, peak resident, text, library, heap, stack, shared and locked memory. Numbers are in KB. This option is current not compatible with -q.
-V Print the version information of papiex, the PAPI library and the PAPI header file papiex was built against.
--no-summary For MPI and multithreaded runs this prevents papiex from printing a summary across threads/tasks.
--no-scientific Disables printing output in scientific notation.
--no-ld-path Do not modify the LD_LIBRARY_PATH environment variable under any circumstances.
--no-gather Do not gather per-process data with MPI and output on the front-end. Instead, output data on each of the nodes.
The simplest use of papiex on a unithreaded, single process program, would be as:
In the above case, the performance measurement of PAPI_TOT_CYC and PAPI_FP_OPS would be written to stderr. To monitor specific events explicitly, one would do:
papex -e PAPI_L1_DCM -e PAPI_L1_TCM /bin/ls
For multithreaded programs, you would simply invoke papiex as above; the multiple threads are automatically handled. The output is written into a directory named <cmd>.papiex.<host>.<pid>.<instance>. In case you just want a high-level summary with no per-thread output files, you would do:
papiex -n -e PAPI_FP_OPS my-threaded-prog
Observe the -n flag which instructs papiex not to create any output files. In contrast, -w tells papiex not to emit text to the console: only to files. Alternatively, you may want per-thread files, but not care for a summary across threads. To achieve this, do:
papiex --no-summary my-threaded-app
Multiple task programs using MPI are automatically handled by papiex, as in:
mpirun -np 4 papiex -f /tmp ./pop
In the above example, the output data files are stored in /tmp/pop.papiex.<host>.<pid>.<instance>. Summary statistics are generated across tasks, which may be disabled using --no-summary. If you want no gathering of statistics by the front-end, and instead want per-task data emitted by each node, use --no-gather. The emitted files can be easily identified by the hostname embedded in the file name.
You can also give a prefix to the output path. For e.g.,
papiex -f mystats. my-threaded-prog
The command above will create a directory ./mystats.my-threaded-prog.papiex.<host>.<pid>.<instance>. For multithreaded and/or MPI programs, papiex creates per-thread/per-task and global statistics summaries across threads/tasks, which are stored this directory.
To facilitate ease of use, the -a flag is provided. This allows automatic monitoring of available interesting events. To enable multiple events to be monitored with limited counters, multiplexing (-m) is implicitly assumed. The flag also works for multihreaded and MPI programs. For any run more than a few minutes, it is strongly recommended that you start your quest for understanding performance with papiex -a. E.g.,
papiex -a my-long-program
fork(2), PAPI(3), getrusage(2), ld.so(8)
If you measure an application or process that makes use of the library preloading mechanism AND you disable the following of fork()s with the -n flag, the child processes will most likely die a horrible death.
Additional bugs should be reported to the OSPAT Mailing List at <email@example.com>.
papiex was written by Philip J. Mucci and Tushar Mohan
This software is COMPLETELY OPEN SOURCE. If you incorporate any portion of this software, I would appreciate an acknowledgement in the appropriate places. Should you find PapiEx useful, please considering making a contribution in the form of hardware, software or plain old cash.
|PAPIEX (1)||May, 2004|