ScaLAPACK instead of LAPACK

Open discussion regarding features, bugs, issues, vendors, etc.

ScaLAPACK instead of LAPACK

Postby ivanov » Tue Jan 10, 2006 12:37 pm

Hi guys,

For some time I have been interested writing small programs using LAPACK. Few weeks ago I installed ScaLAPACK on our cluster and I want to use it for the future. Unfortunately I'm not good enough in Fortran and I write code in C. I tried to read and understand the user manual and examples of ScaLAPACK, but I couldn't find a good example ( simple code) in C that can help me understand everything I need.
The biggest problem I encountered is distribution of a matrix to many processes, and obtaining the whole solution back to the root process.
Here it is a small example using Blas and Lapack to perform Cholesky factorization:

#include <stdio.h>
#include <getopt.h>
#include <math.h>
#include <stdlib.h>
#include <mpi.h>
#include <sys/time.h>
#include "my_include.h"

#define TINY 1.0e-10

int
main(int argc,char * argv[]) {

int N,info,i,j;
double *a,*pa,*c,*pc, alpha=1.0, beta=0.0;
char trans,up,lo;

up='U';
lo='L';
trans='N';

/*determine the matrix size*/
N = atoi(argv[1]);

/* creating the matrices */
a = malloc((N)*(N)*sizeof(double));
c = malloc((N)*(N)*sizeof(double));

if ( a == NULL || c == NULL){
printf(" Storage allocation Failed!\n");
exit(10);
}
pa=a;
pc=c;

/* now initialize the matrix a with psrandom numbers */
for (i=0;i<N;i++, pc++){
for (j=0;j<N;j++, pa++) {
*pa= 1+ (2.0*rand())/(RAND_MAX+1.0);
/* *(pc+j*N) = *pa; */
}
}
/* C:= A*A' i.e C symmetric and positive definite
using Blas */
dsyrk_( &lo, &trans, &N, &N, &alpha, a, &N, &beta, c, &N );

/* Cholesky Factorization using LAPACK */
dpotrf_ (&lo, &N, c, &N, &info);

pc=c;
for (i=0;i<N;i++)
for(j=0;j<N;j++,pc++)
printf(" Element c(%d,%d)= %.3f \n",j,i,*(pc));

exit(0);
}

Very basic, just to illustrate the use of the libs. As input obviously we have dimention of a matrix. In this small example I would like to use pdsyrk_ from PBLAS and and pdpotrf_ from ScaLAPACK. How should I do it properly?(Up to now I didn't succeed).
I know what the user guide says (the theoretical explanation), but for me is easier to understand it by working example. If somebody of you has some time and a will to convert the above example to one that uses ScaLAPACK in C, that would really make things more clear to me.
The system that I'm going to use is a Linux (Redhat) cluster with MPICH 1.6, all libs needed and ScaLAPACK itself are working properly and needed *.h files included.
Thanks!
ivanov
 
Posts: 4
Joined: Tue Jan 10, 2006 9:58 am
Location: Delft, The Nethrelands

Postby Julien Langou » Tue Jan 10, 2006 2:35 pm

Hello, below is a piece of C code that I am using sometimes, this uses LU not Cholesky, hope it can give you a good start. Julien

Code: Select all
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "mpi.h"
#include <sys/time.h>

static int max( int a, int b ){
        if (a>b) return(a); else return(b);
}

#ifdef F77_WITH_NO_UNDERSCORE
#define   numroc_      numroc
#define   descinit_    descinit
#define   pdlamch_     pdlamch
#define   pdlange_     pdlange
#define   pdlacpy_     pdlacpy
#define   pdgesv_      pdgesv
#define   pdgemm_      pdgemm
#define   indxg2p_     indxg2p
#endif

extern void   Cblacs_pinfo( int* mypnum, int* nprocs);
extern void   Cblacs_get( int context, int request, int* value);
extern int    Cblacs_gridinit( int* context, char * order, int np_row, int np_col);
extern void   Cblacs_gridinfo( int context, int*  np_row, int* np_col, int*  my_row, int*  my_col);
extern void   Cblacs_gridexit( int context);
extern void   Cblacs_exit( int error_code);

extern int    numroc_( int *n, int *nb, int *iproc, int *isrcproc, int *nprocs);
extern void   descinit_( int *desc, int *m, int *n, int *mb, int *nb, int *irsrc, int *icsrc,
            int *ictxt, int *lld, int *info);
extern double pdlamch_( int *ictxt , char *cmach);
extern double pdlange_( char *norm, int *m, int *n, double *A, int *ia, int *ja, int *desca, double *work);

extern void pdlacpy_( char *uplo, int *m, int *n, double *a, int *ia, int *ja, int *desca,
            double *b, int *ib, int *jb, int *descb);
extern void pdgesv_( int *n, int *nrhs, double *A, int *ia, int *ja, int *desca, int* ipiv,
            double *B, int *ib, int *jb, int *descb, int *info);
extern void pdgemm_( char *TRANSA, char *TRANSB, int * M, int * N, int * K, double * ALPHA,
            double * A, int * IA, int * JA, int * DESCA, double * B, int * IB, int * JB, int * DESCB,
            double * BETA, double * C, int * IC, int * JC, int * DESCC );
extern int  indxg2p_( int *indxglob, int *nb, int *iproc, int *isrcproc, int *nprocs);

int main(int argc, char **argv) {
   int iam, nprocs;
   int myrank_mpi, nprocs_mpi;
   int ictxt, nprow, npcol, myrow, mycol;
   int np, nq, n, nb, nqrhs, nrhs;
   int i, j, k, info, itemp, seed;
   int descA[9], descB[9];
   double *A, *Acpy, *B, *X, *R, eps, *work;
        double AnormF, XnormF, RnormF, BnormF, residF;
   int *ippiv;
   int izero=0,ione=1;
   double mone=(-1.0e0),pone=(1.0e0);
/**/
   double MPIt1, MPIt2, MPIelapsed;
/**/
   MPI_Init( &argc, &argv);
   MPI_Comm_rank(MPI_COMM_WORLD, &myrank_mpi);
   MPI_Comm_size(MPI_COMM_WORLD, &nprocs_mpi);
/**/
   n = 500; nrhs = 1; nprow = 2; npcol = 2; nb = 64;
/**/
   if (nb>n)
      nb = n;
   if (nprow*npcol>nprocs_mpi){
      if (myrank_mpi==0)
         printf(" **** ERROR : we do not have enough processes available to make a p-by-q process grid ***\n");
         printf(" **** Bye-bye                                                                         ***\n");
      MPI_Finalize(); exit(1);
   }
/**/
   Cblacs_pinfo( &iam, &nprocs ) ;
   Cblacs_get( -1, 0, &ictxt );
   Cblacs_gridinit( &ictxt, "Row", nprow, npcol );
   Cblacs_gridinfo( ictxt, &nprow, &npcol, &myrow, &mycol );
/**/
/*
*
*     Work only the process in the process grid
*
*/
   if ((myrow>-1)&(mycol>-1)&(myrow<nprow)&(mycol<npcol)) {
/*
*
*     Compute the size of the local matrices (thanks to numroc)
*
*/
      np    = numroc_( &n   , &nb, &myrow, &izero, &nprow );
      nq    = numroc_( &n   , &nb, &mycol, &izero, &npcol );
      nqrhs = numroc_( &nrhs, &nb, &mycol, &izero, &npcol );
/*
*
*     Allocate and fill the matrices A and B
*
*/

      seed = iam*n*(n+nrhs); srand(seed);
/**/      
      A = (double *)calloc(np*nq,sizeof(double)) ;
      if (A==NULL){ printf("error of memory allocation A on proc %dx%d\n",myrow,mycol); exit(0); }
/**/      
      Acpy = (double *)calloc(np*nq,sizeof(double)) ;
      if (Acpy==NULL){ printf("error of memory allocation Acpy on proc %dx%d\n",myrow,mycol); exit(0); }
/**/      
      B = (double *)calloc(np*nqrhs,sizeof(double)) ;
      if (B==NULL){ printf("error of memory allocation B on proc %dx%d\n",myrow,mycol); exit(0); }
/**/      
      X = (double *)calloc(np*nqrhs,sizeof(double)) ;
      if (X==NULL){ printf("error of memory allocation X on proc %dx%d\n",myrow,mycol); exit(0); }
/**/      
      R = (double *)calloc(np*nqrhs,sizeof(double)) ;
      if (R==NULL){ printf("error of memory allocation R on proc %dx%d\n",myrow,mycol); exit(0); }
/**/      
      ippiv = (int *)calloc(np+nb,sizeof(int)) ;
      if (ippiv==NULL){ printf("error of memory allocation IPIV on proc %dx%d\n",myrow,mycol); exit(0); }
/**/      
      k = 0;
      for (i = 0; i < np; i++) {
         for (j = 0; j < nq; j++) {
            A[k] = ((double) rand()) / ((double) RAND_MAX) - 0.5 ;
            k++;   
         }
      }
      k = 0;
      for (i = 0; i < np; i++) {
         for (j = 0; j < nqrhs; j++) {
            B[k] = ((double) rand()) / ((double) RAND_MAX) - 0.5 ;
            k++;   
         }
      }
/*
*
*     Initialize the array descriptor for the matrix A and B
*
*/
      itemp = max( 1, np );
      descinit_( descA, &n, &n   , &nb, &nb, &izero, &izero, &ictxt, &itemp, &info );
      descinit_( descB, &n, &nrhs, &nb, &nb, &izero, &izero, &ictxt, &itemp, &info );
/*
*
*     Make a copy of A and the rhs for checking purposes
*/
            pdlacpy_( "All", &n, &n   , A, &ione, &ione, descA, Acpy, &ione, &ione, descA );
            pdlacpy_( "All", &n, &nrhs, B, &ione, &ione, descB, X   , &ione, &ione, descB );
/*
**********************************************************************
*     Call ScaLAPACK PDGESV routine
**********************************************************************
*/
      if( iam==0 ) {
         printf("                                               \n");
         printf("***********************************************\n");
         printf("  Example of ScaLAPACK routine call: (PDGESV)  \n");
         printf("***********************************************\n");
         printf("                                               \n");
         printf("\tn = %d\tnrhs = %d\tprocess grid (%d,%d)\t with blocks: %dx%d\n",n,nrhs,nprow,npcol,nb,nb);
         printf("                                               \n");
      }
/**/
      MPIt1 = MPI_Wtime();
/**/
      pdgesv_( &n, &nrhs, A, &ione, &ione, descA, ippiv, X, &ione, &ione, descB, &info );
/**/
      MPIt2 = MPI_Wtime();
      MPIelapsed=MPIt2-MPIt1;
/**/
      if( iam==0 ) {
         printf("\ttime MPI     = %f s\n",MPIelapsed);
      }

/**/
      if( iam==0 ) {
         printf("                                               \n");
         printf("\tINFO code returned by PDGESV = %d              \n",info);
         printf("                                               \n");
      }
/*
*     Compute residual ||A * X  - B|| / ( ||X|| * ||A|| * eps * N )
*     Froebenius norm
*/
            pdlacpy_( "All", &n, &nrhs, B, &ione, &ione, descB, R   , &ione, &ione, descB );
            eps = pdlamch_( &ictxt, "Epsilon" );
      pdgemm_( "N", "N", &n, &nrhs, &n, &pone, Acpy, &ione, &ione, descA, X, &ione, &ione, descB,
            &mone, R, &ione, &ione, descB);
      AnormF = pdlange_( "F", &n, &n   , A, &ione, &ione, descA, work);
      BnormF = pdlange_( "F", &n, &nrhs, B, &ione, &ione, descB, work);
      XnormF = pdlange_( "F", &n, &nrhs, X, &ione, &ione, descB, work);
      RnormF = pdlange_( "F", &n, &nrhs, R, &ione, &ione, descB, work);
      residF = RnormF / ( AnormF * XnormF * eps * ((double) n));
/**/
      if ( iam==0 ){
         printf("                                                        \n");
         printf("\t||A * X  - B||_F / ( ||X||_F * ||A||_F * eps * N ) = %e \n",residF);
         printf("                                                        \n");
         if (residF<10.0e+0)
            printf("\tThe answer is correct.                            \n");
         else
            printf("\tThe answer is suspicious.                         \n");
      }
/**/
      free(A);
      free(Acpy);
      free(B);
      free(X);
      free(ippiv);
      Cblacs_gridexit( 0 );
   }
/*
*     Print ending messages
*/
   if (iam==0){
      printf("END OF TESTS\n");
      printf("***********************************************\n");
      printf("                                               \n");
   }
   MPI_Finalize();
   exit(0);
}
Julien Langou
 
Posts: 828
Joined: Thu Dec 09, 2004 12:32 pm
Location: Denver, CO, USA

Thanks!

Postby ivanov » Tue Jan 10, 2006 5:56 pm

Thanks a lot for your fast reply Julien!
I think your example is going to help me to solve the problems I have.
ivanov
 
Posts: 4
Joined: Tue Jan 10, 2006 9:58 am
Location: Delft, The Nethrelands

Feeding matrices to ScaLAPACK

Postby rvdg » Mon Mar 06, 2006 5:24 pm

Notice that the example that Julien sent doesn't really help you fill matrices with real values. The fundamental problem is discussed in

H. Carter Edwards and Robert A. van de Geijn. " Application Interface to Parallel Dense Matrix Libraries: Just let me solve my problem!" ACM Transactions on Mathematical Software, submitted.

available from http://www.cs.utexas.edu/users/plapack/new/pubs.html

Regards,
Robert van de Geijn
rvdg
 
Posts: 8
Joined: Mon Feb 13, 2006 7:56 pm

Postby Victor Eijkhout » Thu Mar 09, 2006 12:38 pm

Methinks Robert has a point: that matrix in Julien's example gets filled with random values. There's no indication there how to fill the matrix with elements that are a function of i and j.

The following code, while inefficient, at least gives you processor, block, and coordinate-in-block, as a function of global i and j.

Code: Select all
    p_of_i(i,bs,p) = mod(int((i-1)/bs),p)
    l_of_i(i,bs,p) = int((i-1)/(p*bs))
    x_of_i(i,bs,p) = mod(i-1,bs)+1

    pi = p_of_i(i,bs_i,nprows)
    pi = p_of_i(j,bs_j,pcols)
c likewise for xi = x_of_i(...), xj, li, lj
    if (pi.eq.myprow .and. pj.eq.mypcol) then
       mat(li*bs_i+xi,lj*bs_j+xj) = mat_global(i,j)


Btw, the first three lines are fortran statement functions. In C you'd make them cpp macros.

Victor.
Victor Eijkhout
 
Posts: 4
Joined: Thu Dec 09, 2004 12:19 am
Location: Austin, TX

Postby ivanov » Mon May 29, 2006 7:42 pm

Hi Guys,

You've been quite helpful last time, so I have another question for you :).
In the software I write I need to form a distributed matrix and 1D cyclic rows distribution fits perfectly due to the way I have to compute it. 2D block cyclic is just too slow for this particular case. And here it comes the problem for me, how to redistribute the 1D row cyclic matrix to 2D block cyclic with no need to copy the whole matrix to one node (assume there is no enough memory to to fit it). I just need to find the Cholesky factorization of this matrix on next step, so I prefer to have 2D distribution.
My question is if there is a buid-in function in ScaLAPACK to do this transformation for me? (I mean to copy 1D cyclic to another 2D block cyclic matrix)

Ivan
ivanov
 
Posts: 4
Joined: Tue Jan 10, 2006 9:58 am
Location: Delft, The Nethrelands

Postby Julien Langou » Tue May 30, 2006 9:54 am

SUBROUTINE PDGEMR2D( M, N, A, IA, JA, ADESC, B, IB, JB, BDESC, CTXT)

Purpose
=======

PDGEMR2D copies a submatrix of A on a submatrix of B.
A and B can have different distributions: they can be on different
processor grids, they can have different blocksizes, the beginning
of the area to be copied can be at a different places on A and B.


The documentation is in the header of the file: REDIST/SRC/pdgemr.c
and an example of use (not what you ask) is given at:
http://icl.cs.utk.edu/lapack-forum/viewtopic.php?t=50

Let us know if you have problem using the routine,
Julien
Julien Langou
 
Posts: 828
Joined: Thu Dec 09, 2004 12:32 pm
Location: Denver, CO, USA

rows to cols

Postby ivanov » Mon Jun 19, 2006 6:52 pm

Hi Julien,

Of course pdgemr2d solved my problem. I had it several times with wrong input arguments and the strange results pushed me to ask just in case if it's going to do what I needed. This problem was solved pretty easy with some extra attention in calling pdgemr2d.
I have unfortunately another issue that was making my life miserable today.
Suppose we have a matrix that is again distributed 1D cyclic on its rows (nb=1). The matrix is square and should be symmetric, so I don't need the lower diagonal. Unfortunately, the way of computing it I come up with few only rows (sometimes) in the lower triangle that have some non-zero value. These have to be added later symmetrically to the upper triangle.
I decided to try to use PDROW2COL to extract the nonzero row I need, and copy it to a distributed vector over the processing rows. Then is piece of cake to add it to the matrix by PDMATADD to the column of choice.
Everything would work fine if I can make PDROW2COL to use only one of the local rows (1 global row is property of only one process because ot the distribution). Instead, PDROW2COL is giving me next N elements from the memory block after the pointer I specify as input (and these values are from other local rows of course) instead of N values from 1 row. One solution is to have another local vector to copy the values localy and then to apply PDROW2COL to that vector, but maybe there exist better solution using another function.
Thus, my question is are there other way to do what I need using in-built functions in ScaLAPACK?

Ivan
ivanov
 
Posts: 4
Joined: Tue Jan 10, 2006 9:58 am
Location: Delft, The Nethrelands

Re: ScaLAPACK instead of LAPACK

Postby ispmarin » Fri Nov 27, 2009 1:56 pm

Hello all,

This thread is very helpful, as I'm now trying to distribute my total matrix to the processes to use pdgesv. But I still have some doubts on the total vs local approach to the matrix distribution. The pdgesv documentation says that

PDGESV computes the solution to a real system of linear equations
*
* sub( A ) * X = sub( B ),
*
* where sub( A ) = A(IA:IA+N-1,JA:JA+N-1) is an N-by-N distributed
* matrix and X and sub( B ) = B(IB:IB+N-1,JB:JB+NRHS-1) are N-by-NRHS
* distributed matrices.


So I have to provide the routine with a local matrix, a block from the total matrix. This local matrix have locR*locC size, where locC and locR is given by numroc. But then

* IA (global input) INTEGER
* The row index in the global array A indicating the first
* row of sub( A ).
*
* JA (global input) INTEGER
* The column index in the global array A indicating the
* first column of sub( A ).

So I should provide pdgesv the local index, that in C++ will be always zero, or the index that that specific block starts in the total matrix?

Thank you!
ispmarin
 
Posts: 8
Joined: Sat May 31, 2008 3:02 pm


Return to User Discussion

Who is online

Users browsing this forum: No registered users and 5 guests