PAPIC:Parallel Programming
From PAPIDocs
Jump to: navigation, search

Using PAPI with Parallel Programs

Threads

A thread is an independent flow of instructions that can be scheduled to run by the operating system. Multi-threaded programming is a form of parallel programming where several controlled threads are executing concurrently in the program. All threads execute in the same memory space, and can therefore work concurrently on shared data. Threads can run in parallel on several processors, allowing a single program to divide its work between several processors, thus running faster than a single-threaded program, which runs on only one processor at a time.

In PAPI, each thread is responsible for the creation, start, stop, and read of its own counters. When a thread is created, it inherits no PAPI information or state from the calling thread unless explicitly specified.

For those on highly modified systems, the user should take care to set the scope of each thread to PTHREAD_SCOPE_SYSTEM attribute, unless the system is known to have a non-hybrid thread library implementation. PAPI does support unbound or user threads explicitly, but it should work and the counts will reflect totals for the underlying bound thread. PAPI supports threading agnostically by allowing the user to specify the function that returns the current thread ID. For nearly all platforms, this will be the pthread_self function. If your system has some other way of identifying a unique kernel thread with a PMU context, it should be specified here.

Initialization of Thread Support

Thread support in the PAPI library can be initialized by calling the following low-level function in C:

C:

int PAPI_thread_init(unsigned long(*handle)(void));

Arguments

handle Pointer to a routine that returns the current thread ID as an unsigned long.

This function should be called only once, after PAPI_library_init and before any other PAPI calls. Applications that make no use of threads do not need to call this function.

Thread ID

The identifier of the current thread can be obtained by calling the following low-level function:

C:

unsigned long *PAPI_thread_id();

Fortran:

PAPIF_thread_id(check)

This function calls the thread id function registered by PAPI_thread_init and returns an unsigned long integer containing the thread identifier.

Example

In the following code example, PAPI_thread_init() and PAPI_thread_id() are used to initialize thread support in the PAPI library and to acquire the identifier of the current thread, respectively. Unless you know otherwise, you should always use pthread_self as an argument to PAPI_thread_init. Never use omp_get_thread_num.

#include <papi.h>
#include <pthread.h>
 
main()
{
  int retval;
  unsigned long int tid;
 
  retval = PAPI_library_init(PAPI_VER_CURRENT);
  if (retval != PAPI_VER_CURRENT) handle_error(retval);
 
  retval = PAPI_thread_init(pthread_self);
  if (retval != PAPI_OK) handle_error(retval);
 
  if ((tid = PAPI_thread_id()) == (unsigned long int)-1)
    handle_error(1);
 
  printf("Initial thread id is: %lu\n",tid);
}

Output

Initial thread id is: 0

On success, this function returns a valid thread identifier and on error, (unsigned long int) –1 is returned.

Thread Utilities

Four more utility functions related to threads are available in PAPI. These functions allow you to register a newly created thread to make it available for reference by PAPI, to remove a registered thread in cases where thread ids may be reused by the system, and to create and access thread-specific storage in a platform independent fashion for use with PAPI. These functions are shown below in C; The first two functions are also available in Fortran:

int PAPI_register_thread();
int PAPI_unregister_thread();
int PAPI_get_thr_specific(int tag, void **ptr)
int PAPI_set_thr_specific(int tag, void *ptr)

Arguments

tag Integer value specifying one of 4 storage locations.
ptr Pointer to the address of a data structure.

For more code examples of using Pthreads and OpenMP with PAPI, see ctests/zero_pthreads.c and ctests/zero_omp.c, respectively. Also, for a code example of using SMP with PAPI, see ctests/zero_smp.c.

MPI

MPI is an acronym for Message Passing Interface. MPI is a library specification for message-passing, proposed as a standard by a broadly based committee of vendors, implementers, and users. MPI was designed for high performance on both massively parallel machines and on workstation clusters. More information on MPI can be found at www-unix.mcs.anl.gov/mpi, or by googling "MPI".

PAPI supports MPI. When using timers in applications that contain multiplexing, profiling, and overflow, MPI uses a default virtual timer and must be converted to a real timer in order to for the application to work properly. Otherwise, the application will exit.

Optionally, several supported tools including TAU can be used to implement PAPI with MPI. The following is a code example of using MPI’s PI program with PAPI:

#include <papi.h>
#include <mpi.h>
#include <math.h>
#include <stdio.h>
 
int main(argc,argv)
int argc;
char *argv[];
{
  int done = 0, n, myid, numprocs, i, rc, retval, EventSet = PAPI_NULL;
  double PI25DT = 3.141592653589793238462643;
  double mypi, pi, h, sum, x, a;
  long_long values[1] = {(long_long) 0};
 
  MPI_Init(&argc,&argv);
  MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
  MPI_Comm_rank(MPI_COMM_WORLD,&myid);
 
  /*Initialize the PAPI library */
  retval = PAPI_library_init(PAPI_VER_CURRENT);
  if (retval != PAPI_VER_CURRENT) handle_error(retval);
}
 
  /* Create an EventSet */
  retval = PAPI_create_eventset(&EventSet);
  if (retval != PAPI_OK) handle_error(retval);
 
  /* Add Total Instructions Executed to our EventSet */
  retval = PAPI_add_event(EventSet, PAPI_TOT_INS);
  if (retval != PAPI_OK) handle_error(retval);
 
  /* Start counting */
  retval = PAPI_start(EventSet);
  if (retval != PAPI_OK) handle_error(retval);
 
  while (!done)
  {
    if (myid == 0) {
        printf("Enter the number of intervals: (0 quits) ");
        scanf("%d",&n);
    }
    MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);
    if (n == 0) break;
 
    h   = 1.0 / (double) n;
    sum = 0.0;
    for (i = myid + 1; i <= n; i += numprocs) {
        x = h * ((double)i - 0.5);
        sum += 4.0 / (1.0 + x*x);
    }
    mypi = h * sum;
 
    MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0,MPI_COMM_WORLD);
 
    if (myid == 0)
        printf("pi is approximately %.16f, Error is %.16f\n",
               pi, fabs(pi - PI25DT));
    }
 
   /* Read the counters */
   retval = PAPI_read(EventSet, values) != PAPI_OK);
   if (retval != PAPI_OK) handle_error(retval);
 
   printf("After reading counters: %lld\n",values[0]);
 
   /* Start the counters */
   retval = PAPI_stop(EventSet, values);
   if (retval != PAPI_OK) handle_error(retval);
   printf("After stopping counters: %lld\n",values[0]);
 
   MPI_Finalize();
}

Possible Output

(after entering 50, 75, and 100 as input)

Enter the number of intervals: (0 quits) 50
pi is approximately 3.1416259869230028, Error is 0.0000333333332097
Enter the number of intervals: (0 quits) 75
pi is approximately 3.1416074684045965, Error is 0.0000148148148034
Enter the number of intervals: (0 quits) 100
pi is approximately 3.1416009869231254, Error is 0.0000083333333323
Enter the number of intervals: (0 quits) 0
After reading counters: 117393
After stopping counters: 122921