Can scalapck (pblas+blacs) be used with explicit MPI?

Open discussion regarding features, bugs, issues, vendors, etc.

Can scalapck (pblas+blacs) be used with explicit MPI?

Postby phymilton » Wed Jan 04, 2006 6:23 pm

Scalapck is based on pblas and blacs which is a seperate communciation language different from MPI. Blacs has it own initializ and close subroutine. So in scalapck code, there is no explicit MPI and scalapck will use all the available CPUs as one special CPU and all the memroy as one memory attached to this special CPU. I can specify how many CPUs for my scalapck program (for example: 4, 16 ...).

My question is: can I have the chance to write a code which use both scalapck and explicit MPI? That is: specify each 4 CPUs as a group and each group run my scalapack code and at the same time I use MPI in the code to assign differeent group of CPUs for scalapack code.

For example, my code required large memory (i.e. one matrix is larger than 4GB), I have to use scalapack to use distribute memory. At the same time my code has a main loop and in each step of the main loop the calculation is independent, so I can use MPI very easily to let it run parallelly.

If I have 16 CPUs available, I can group 4 CPUs a unit, then I have for set of 4 CPUs. Can I use both MPI and scalapack to let the my code automatically assign 4 CPUs for each scalapack code and the scalapck code is running parallelly on 4 different set of 4-CPUs.

this problem is sound like a "hybrid" parallel computaion. Anyone has the similar problem to solve and can give me some suggestion?

Thank you very much!
phymilton
 
Posts: 19
Joined: Mon Jan 24, 2005 11:41 pm
Location: Ames, IA

Postby Julien Langou » Tue Jan 10, 2006 4:19 pm

can I have the chance to write a code which use both scalapack and explicit MPI? That is: specify each 4 CPUs as a group and each group run my scalapack code and at the same time I use MPI in the code to assign differeent group of CPUs for scalapack code.


You can use blacs_gidmap

http://www.netlib.org/blacs/BLACS/QRef.html#BLACS_GRIDMAP

to do what you want.

For example, the following code will execute test_pdgesv on the two processors: processor rank#0 and rank#2 positioned in a 1-by-2 process grid and execute test_pdgesv on the two processors: processor rank#1 and rank#3 positioned in a 2-by-1 process grid. The others processes (if any) are not used. test_pdgesv can be a routine that do whatever you want
Here I am testing PDGESV with the parameters:
n1 = 500; nrhs1 = 1; nprow1 = 1; npcol1 = 2; nb1 = 64;
n2 = 250; nrhs2 = 2; nprow2 = 2; npcol2 = 1; nb2 = 50;
It is convenient to pass the context as an argument of this routine.

If you are used to MPI a good idea is to think of a context as a communicator, for each BLACS communication you need to specify its context.

This extends to ScaLAPACK, for eack ScaLAPACK matrix you specify its context as well (in descA).

Code: Select all
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "mpi.h"
#include <sys/time.h>

static int max( int a, int b ){
        if (a>b) return(a); else return(b);
}

#ifdef F77_WITH_NO_UNDERSCORE
#define   numroc_      numroc
#define   descinit_    descinit
#define   pdlamch_     pdlamch
#define   pdlange_     pdlange
#define   pdlacpy_     pdlacpy
#define   pdgesv_      pdgesv
#define   pdgemm_      pdgemm
#define   indxg2p_     indxg2p
#endif

extern void   Cblacs_pinfo( int* mypnum, int* nprocs);
extern void   Cblacs_get( int context, int request, int* value);
extern int    Cblacs_gridinit( int* context, char * order, int np_row, int np_col);
extern void   Cblacs_gridinfo( int context, int*  np_row, int* np_col, int*  my_row, int*  my_col);
extern void   Cblacs_gridexit( int context);
extern void   Cblacs_exit( int error_code);

extern int    numroc_( int *n, int *nb, int *iproc, int *isrcproc, int *nprocs);
extern void   descinit_( int *desc, int *m, int *n, int *mb, int *nb, int *irsrc, int *icsrc,
            int *ictxt, int *lld, int *info);
extern double pdlamch_( int *ictxt , char *cmach);
extern double pdlange_( char *norm, int *m, int *n, double *A, int *ia, int *ja, int *desca, double *work);

extern void pdlacpy_( char *uplo, int *m, int *n, double *a, int *ia, int *ja, int *desca,
            double *b, int *ib, int *jb, int *descb);
extern void pdgesv_( int *n, int *nrhs, double *A, int *ia, int *ja, int *desca, int* ipiv,
            double *B, int *ib, int *jb, int *descb, int *info);
extern void pdgemm_( char *TRANSA, char *TRANSB, int * M, int * N, int * K, double * ALPHA,
            double * A, int * IA, int * JA, int * DESCA, double * B, int * IB, int * JB, int * DESCB,
            double * BETA, double * C, int * IC, int * JC, int * DESCC );
extern int  indxg2p_( int *indxglob, int *nb, int *iproc, int *isrcproc, int *nprocs);

int main(int argc, char **argv) {
   int iam, nprocs;
   int myrank_mpi, nprocs_mpi;
   int ictxt1, nprow1, npcol1, myrow1, mycol1;
   int ictxt2, nprow2, npcol2, myrow2, mycol2;
   int np1, nq1, n1, nb1, nrhs1;
   int np2, nq2, n2, nb2, nrhs2;
   int i, j, k, info, itemp, seed;
   int descA[9], descB[9];
   double *A, *Acpy, *B, *X, *R, eps, *work;
        double AnormF, XnormF, RnormF, BnormF, residF;
   int *ippiv;
   int izero=0,ione=1;
   double mone=(-1.0e0),pone=(1.0e0);
   int *usermap1, *usermap2, ldumap1, ldumap2;
/**/
   double MPIt1, MPIt2, MPIelapsed;
/**/
   MPI_Init( &argc, &argv);
   MPI_Comm_rank(MPI_COMM_WORLD, &myrank_mpi);
   MPI_Comm_size(MPI_COMM_WORLD, &nprocs_mpi);
/**/
   Cblacs_pinfo( &iam, &nprocs ) ;
   Cblacs_get( -1, 0, &ictxt1 );
   Cblacs_get( -1, 0, &ictxt2 );
/**/
   n1 = 500; nrhs1 = 1; nprow1 = 1; npcol1 = 2; nb1 = 64; ldumap1 = nprow1;
   n2 = 250; nrhs2 = 2; nprow2 = 2; npcol2 = 1; nb2 = 50; ldumap2 = nprow2;
/**/
   usermap1 = (int *)calloc(ldumap1*npcol1,sizeof(int)) ;
   usermap2 = (int *)calloc(ldumap2*npcol2,sizeof(int)) ;
/**/
   usermap1[0] = 0; usermap1[1] = 2;
   usermap2[0] = 1; usermap2[1] = 3;
/**/
   if ((nprow1*npcol1)+(nprow2*npcol2)>nprocs_mpi){
      if (myrank_mpi==0)
         printf(" **** ERROR : I want to use %d processes and only have %d\n",(nprow1*npcol1)+(nprow2*npcol2),nprocs_mpi);
      MPI_Finalize(); exit(1);
   }
/**/
   Cblacs_gridmap ( &ictxt1, usermap1, ldumap1, nprow1, npcol1 );
   free(usermap1);
   Cblacs_gridmap ( &ictxt2, usermap2, ldumap2, nprow2, npcol2 );
   free(usermap2);
/**/
   myrow1=-1;mycol1=-1;myrow2=-1;mycol2=-1;
   Cblacs_gridinfo( ictxt1, &nprow1, &npcol1, &myrow1, &mycol1 );
   Cblacs_gridinfo( ictxt2, &nprow2, &npcol2, &myrow2, &mycol2 );
/**/
   if ((myrow1>-1)&(mycol1>-1)&(myrow1<nprow1)&(mycol1<npcol1)) {
      test_pdgesv( n1, nrhs1, nprow1, npcol1, nb1, ictxt1);
   }
   if ((myrow2>-1)&(mycol2>-1)&(myrow2<nprow2)&(mycol2<npcol2)) {
      test_pdgesv( n2, nrhs2, nprow2, npcol2, nb2, ictxt2);
   }
/**/
   if  ( ((myrow1>-1)&(mycol1>-1)&(myrow1<nprow1)&(mycol1<npcol1))
      || ((myrow2>-1)&(mycol2>-1)&(myrow2<nprow2)&(mycol2<npcol2))
       )
      Cblacs_gridexit( 0 );

   MPI_Finalize();
   exit(0);
}



Code: Select all
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "mpi.h"
#include <sys/time.h>

static int max( int a, int b ){
        if (a>b) return(a); else return(b);
}

#ifdef F77_WITH_NO_UNDERSCORE
#define   numroc_      numroc
#define   descinit_    descinit
#define   pdlamch_     pdlamch
#define   pdlange_     pdlange
#define   pdlacpy_     pdlacpy
#define   pdgesv_      pdgesv
#define   pdgemm_      pdgemm
#define   indxg2p_     indxg2p
#endif

extern void   Cblacs_pinfo( int* mypnum, int* nprocs);
extern void   Cblacs_get( int context, int request, int* value);
extern int    Cblacs_gridinit( int* context, char * order, int np_row, int np_col);
extern void   Cblacs_gridinfo( int context, int*  np_row, int* np_col, int*  my_row, int*  my_col);
extern void   Cblacs_gridexit( int context);
extern void   Cblacs_exit( int error_code);

extern int    numroc_( int *n, int *nb, int *iproc, int *isrcproc, int *nprocs);
extern void   descinit_( int *desc, int *m, int *n, int *mb, int *nb, int *irsrc, int *icsrc,
            int *ictxt, int *lld, int *info);
extern double pdlamch_( int *ictxt , char *cmach);
extern double pdlange_( char *norm, int *m, int *n, double *A, int *ia, int *ja, int *desca, double *work);

extern void pdlacpy_( char *uplo, int *m, int *n, double *a, int *ia, int *ja, int *desca,
            double *b, int *ib, int *jb, int *descb);
extern void pdgesv_( int *n, int *nrhs, double *A, int *ia, int *ja, int *desca, int* ipiv,
            double *B, int *ib, int *jb, int *descb, int *info);
extern void pdgemm_( char *TRANSA, char *TRANSB, int * M, int * N, int * K, double * ALPHA,
            double * A, int * IA, int * JA, int * DESCA, double * B, int * IB, int * JB, int * DESCB,
            double * BETA, double * C, int * IC, int * JC, int * DESCC );
extern int  indxg2p_( int *indxglob, int *nb, int *iproc, int *isrcproc, int *nprocs);

int test_pdgesv( int n, int nrhs, int nprow, int npcol, int nb, int ictxt){
   int iam, nprocs;
   int myrank_mpi, nprocs_mpi;
   int myrow, mycol;
   int np, nq, nqrhs;
   int i, j, k, info, itemp, seed;
   int descA[9], descB[9];
   double *A, *Acpy, *B, *X, *R, eps, *work;
        double AnormF, XnormF, RnormF, BnormF, residF;
   int *ippiv;
   int izero=0,ione=1;
   double mone=(-1.0e0),pone=(1.0e0);
/**/
   double MPIt1, MPIt2, MPIelapsed;
/**/
   Cblacs_pinfo( &iam, &nprocs ) ;
   Cblacs_gridinfo( ictxt, &nprow, &npcol, &myrow, &mycol );

   if (nb>n) nb = n;

   np    = numroc_( &n   , &nb, &myrow, &izero, &nprow );
   nq    = numroc_( &n   , &nb, &mycol, &izero, &npcol );
   nqrhs = numroc_( &nrhs, &nb, &mycol, &izero, &npcol );

   seed = iam*n*(n+nrhs); srand(seed);
   A = (double *)calloc(np*nq,sizeof(double)) ;
   if (A==NULL){ printf("error of memory allocation A on proc %dx%d\n",myrow,mycol); exit(0); }
   Acpy = (double *)calloc(np*nq,sizeof(double)) ;
   if (Acpy==NULL){ printf("error of memory allocation Acpy on proc %dx%d\n",myrow,mycol); exit(0); }
   B = (double *)calloc(np*nqrhs,sizeof(double)) ;
   if (B==NULL){ printf("error of memory allocation B on proc %dx%d\n",myrow,mycol); exit(0); }
   X = (double *)calloc(np*nqrhs,sizeof(double)) ;
   if (X==NULL){ printf("error of memory allocation X on proc %dx%d\n",myrow,mycol); exit(0); }
   R = (double *)calloc(np*nqrhs,sizeof(double)) ;
   if (R==NULL){ printf("error of memory allocation R on proc %dx%d\n",myrow,mycol); exit(0); }
   ippiv = (int *)calloc(np+nb,sizeof(int)) ;
   if (ippiv==NULL){ printf("error of memory allocation IPIV on proc %dx%d\n",myrow,mycol); exit(0); }

   k = 0;
   for (i = 0; i < np; i++) {
      for (j = 0; j < nq; j++) {
         A[k] = ((double) rand()) / ((double) RAND_MAX) - 0.5 ;
         k++;   
      }
   }
   k = 0;
   for (i = 0; i < np; i++) {
      for (j = 0; j < nqrhs; j++) {
         B[k] = ((double) rand()) / ((double) RAND_MAX) - 0.5 ;
         k++;   
      }
   }
   itemp = max( 1, np );
   descinit_( descA, &n, &n   , &nb, &nb, &izero, &izero, &ictxt, &itemp, &info );
   descinit_( descB, &n, &nrhs, &nb, &nb, &izero, &izero, &ictxt, &itemp, &info );

   pdlacpy_( "All", &n, &n   , A, &ione, &ione, descA, Acpy, &ione, &ione, descA );
   pdlacpy_( "All", &n, &nrhs, B, &ione, &ione, descB, X   , &ione, &ione, descB );

   MPIt1 = MPI_Wtime();
   pdgesv_( &n, &nrhs, A, &ione, &ione, descA, ippiv, X, &ione, &ione, descB, &info );
   MPIt2 = MPI_Wtime();
   MPIelapsed=MPIt2-MPIt1;

   pdlacpy_( "All", &n, &nrhs, B, &ione, &ione, descB, R   , &ione, &ione, descB );
         eps = pdlamch_( &ictxt, "Epsilon" );
   pdgemm_( "N", "N", &n, &nrhs, &n, &pone, Acpy, &ione, &ione, descA, X, &ione, &ione, descB,
         &mone, R, &ione, &ione, descB);
   AnormF = pdlange_( "F", &n, &n   , A, &ione, &ione, descA, work);
   BnormF = pdlange_( "F", &n, &nrhs, B, &ione, &ione, descB, work);
   XnormF = pdlange_( "F", &n, &nrhs, X, &ione, &ione, descB, work);
   RnormF = pdlange_( "F", &n, &nrhs, R, &ione, &ione, descB, work);
   residF = RnormF / ( AnormF * XnormF * eps * ((double) n));

   if ( (myrow==0)&(mycol==0) ){
      printf("\tn = %d\tnrhs = %d\tprocess grid (%d,%d)\t with blocks (%dx%d) : res = %e, info = %d, time = %f s\n",
            n,nrhs,nprow,npcol,nb,nb,residF,info,MPIelapsed);
   }

   free(A);
   free(Acpy);
   free(B);
   free(X);
   free(ippiv);

}


Julien Langou
 
Posts: 733
Joined: Thu Dec 09, 2004 12:32 pm
Location: Denver, CO, USA

Thank you very much.

Postby phymilton » Tue Jan 10, 2006 7:01 pm

Thanks a lot to give me a example to do this job. I used to use Fortran 90 as my program language. I will try to re-do what you did in this example in fortran.

Have a nice day!
phymilton
 
Posts: 19
Joined: Mon Jan 24, 2005 11:41 pm
Location: Ames, IA

Re: Thank you very much.

Postby Beregond » Mon Sep 24, 2007 12:45 pm

I have a similar problem as that discussed in this thread -- I have an existing MPI F90 program, and want to use ScaLAPACK for the linear algebra part. An additional complication is that I have multiple communicators, and would like each of these to perform independent operations. This means that the above examples do not apply directly to my case, right?

So, first of all, is this possible in the first place, and second, are there any example programs out there that might help me?

Thank you in advance for your help!
Beregond
 
Posts: 1
Joined: Mon Sep 24, 2007 12:38 pm

Postby Julien Langou » Mon Sep 24, 2007 12:57 pm

Hello, I have done it several time from C and this works fine. I have no
idea if this works from F90 but there is no reason why not. The information
should be: "Outstanding Issues in the MPIBLACS" by R. Clint Whaley. Julien.
Julien Langou
 
Posts: 733
Joined: Thu Dec 09, 2004 12:32 pm
Location: Denver, CO, USA


Return to User Discussion

Who is online

Users browsing this forum: Bing [Bot], Yahoo [Bot] and 2 guests