Threads
From PAPIDocs
Jump to: navigation, search

USING PAPI WITH PARALLEL PROGRAMS

THREADS

WHAT ARE THREADS?

A thread is an independent flow of instructions that can be scheduled to run by the operating system. Multi-threaded programming is a form of parallel programming where several controlled threads are executing concurrently in the program. All threads execute in the same memory space, and can therefore work concurrently on shared data. Threads can run in parallel on several processors, allowing a single program to divide its work between several processors, thus running faster than a single-threaded program, which runs on only one processor at a time.

PAPI only supports thread level measurements with kernel or bound threads, which are threads that have a scheduling entity known and handled by the operating system’s kernel. In most cases, such as with SMP or OpenMP complier directives, bound threads will be the default. Each thread is responsible for the creation, start, stop, and read of its own counters. When a thread is created, it inherits no PAPI information from the calling thread. There are some threading packages or APIs that can be used to manipulate threads with PAPI, particularly Pthreads and OpenMP. For those using Pthreads, the user should take care to set the scope of each thread to PTHREAD_SCOPE_SYSTEM attribute, unless the system is known to have a non-hybrid thread library implementation.

In addition, PAPI does support unbound or non-kernel threads, but the counts will reflect the total events for the process. Measurements that are done in other threads will get all the same values, namely the counts for the total process. For unbound threads, it is not necessary to call PAPI_thread_init, which will be discussed in the next section.

When threads are in use, PAPI allows the user to provide a routine to its library that returns the thread ID of the currently running thread (for example, pthreads_self for Pthreads) and this thread ID is used as a lookup function for the internal data structures.


INITIALIZATION OF THREAD SUPPORT

Thread support in the PAPI library can be initialized by calling the following low-level function:

C:

 
PAPI_thread_init(''handle'')

 

Fortran:

 
PAPIF_thread_init(''handle'', check)

 


ARGUMENTS

handle -- Pointer to a routine that returns the current thread ID.

This function should be called only once, just after'PAPI_library_init', and before any other PAPI calls. If the function is called more than once, the application will exit. Also, applications that make no use of threads do not need to call this function.

The following example shows the correct syntax for using PAPI_thread_init with OpenMP:

 
'''C:'''
#include <papi.h>
#include <omp.h>
if (PAPI_thread_init(omp_get_thread_num) != PAPI_OK)
  handle_error(1);

 

Fortran:

 
#include “fpapi.h”
#include “omp.h”
EXTERNAL omp_get_thread_num
C Fortran dictates that in order to a pass a subroutine
C as an argument, the subroutine must be
C declared external!
call PAPIF_thread_init(omp_get_thread_num, error) 
 


On success, the function, PAPI_thread_init, returns PAPI_OK and on error, a non-zero error code is returned.

For a code example of using PAPI_thread_init with Pthreads, see the next section.


THREAD ID

The identifier of the current thread can be obtained by calling the following low-level function:

C:

 
PAPI_thread_id()

 

Fortran:

 
PAPIF_thread_id(check)
 
 

This function calls the thread id function registered by PAPI_thread_init and returns an unsigned long integer containing the thread identifier.

In the following code example, PAPI_thread_init and PAPI_thread_id are used to initialize thread support in the PAPI library and to acquire the identifier of the current thread, respectively, with Pthreads:

 
#include <papi.h>
#include <pthread.h>

main()
{
unsigned long int tid;

if (PAPI_library_init(PAPI_VER_CURRENT) != PAPI_VER_CURRENT)
  exit(1);

if (PAPI_thread_init(pthread_self) != PAPI_OK)
  exit(1);

if ((tid = PAPI_thread_id()) == (unsigned long int)-1)
  exit(1);

printf("Initial thread id is: %lu\n",tid);
}
 



OUTPUT:
 
Initial thread id is: 0
 

'

On success, this function returns a valid thread identifier and on error, (unsigned long int) –1 is returned.

Four more utility functions related to threads are available in PAPI. These functions allow you to register a newly created thread to make it available for reference by PAPI, to remove a registered thread in cases where thread ids may be reused by the system, and to create and access thread-specific storage in a platform independent fashion for use with PAPI. These functions are shown below:

C:

 
PAPI_register_thread()
PAPI_unregister_thread()
PAPI_get_thr_specific(''tag, ptr'')
PAPI_set_thr_specific(''tag, ptr'')

 


ARGUMENTS
 
''tag'' -- Integer value specifying one of 4 storage locations. 	
''ptr'' -- Pointer to the address of a data structure. 

 

For more code examples of using Pthreads and OpenMP with PAPI, see ctests/zero_pthreads.c and ctests/zero_omp.c in the papi source distribution, respectively. Also, for a code example of using SMP with PAPI, see ctests/zero_smp.c in the papi source distribution.


MPI

MPI is an acronym for Message Passing Interface. MPI is a library specification for message-passing, proposed as a standard by a broadly based committee of vendors, implementers, and users. MPI was designed for high performance on both massively parallel machines and on workstation clusters. More information on MPI can be found at http://www-unix.mcs.anl.gov/mpi.

PAPI supports MPI. When using timers in applications that contain multiplexing, profiling, and overflow, MPI uses a default virtual timer and must be converted to a real timer in order to for the application to work properly. Otherwise, the application will exit.

Optionally, the supported tools, TAU and SvPablo, can be used to implement PAPI with MPI.

The following is a code example of using MPI’s PI program with PAPI:

 
#include <papi.h>
#include <mpi.h>
#include <math.h>
#include <stdio.h>

int main(argc,argv)
int argc;
char *argv[];
{
  int done = 0, n, myid, numprocs, i, rc, retval, EventSet = PAPI_NULL;
  double PI25DT = 3.141592653589793238462643;
  double mypi, pi, h, sum, x, a;
  long_long values[1] = {(long_long) 0};

  MPI_Init(&argc,&argv);
  MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
  MPI_Comm_rank(MPI_COMM_WORLD,&myid);

  /*Initialize the PAPI library */
  retval = PAPI_library_init(PAPI_VER_CURRENT);
  if (retval != PAPI_VER_CURRENT) {
    fprintf(stderr, "PAPI library init error!\n");
    exit(1);
}

  /* Create an EventSet */
  if (PAPI_create_eventset(&EventSet) != PAPI_OK)
    handle_error(1);

/* Add Total Instructions Executed to our EventSet */
  if (PAPI_add_event(EventSet, PAPI_TOT_INS) != PAPI_OK)
    handle_error(1);

  /* Start counting */
  if (PAPI_start(EventSet) != PAPI_OK)
    handle_error(1);

  while (!done)
  {
    if (myid == 0) {
        printf("Enter the number of intervals: (0 quits) ");
        scanf("%d",&n);
    }
    MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);
    if (n == 0) break;

    h   = 1.0 / (double) n;
    sum = 0.0;
    for (i = myid + 1; i <= n; i += numprocs) {
        x = h * ((double)i - 0.5);
        sum += 4.0 / (1.0 + x*x);
    }
    mypi = h * sum;

    MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0,MPI_COMM_WORLD);

    if (myid == 0)
        printf("pi is approximately %.16f, Error is %.16f\n",
               pi, fabs(pi - PI25DT));
    }

   /* Read the counters */
   if (PAPI_read(EventSet, values) != PAPI_OK)
     handle_error(1);

   printf("After reading counters: %lld\n",values[0]);

   /* Start the counters */
   if (PAPI_stop(EventSet, values) != PAPI_OK)
     handle_error(1);
   printf("After stopping counters: %lld\n",values[0]);

   MPI_Finalize();
}
 



POSSIBLE OUTPUT (AFTER ENTERING 50, 75, AND 100 AS INPUT):
 
Enter the number of intervals: (0 quits) 50
pi is approximately 3.1416259869230028, Error is 0.0000333333332097
Enter the number of intervals: (0 quits) 75
pi is approximately 3.1416074684045965, Error is 0.0000148148148034
Enter the number of intervals: (0 quits) 100
pi is approximately 3.1416009869231254, Error is 0.0000083333333323
Enter the number of intervals: (0 quits) 0
After reading counters: 117393
After stopping counters: 122921