ScaLAPACK Archives

[Scalapack] matrix inverse


hello scalapackers,
hello Desheng,

here are a bug-fixed version of pdgetri.f and the associated pdinvdriver.f 
(same modification has been applied to s,c, and z) any comments welcome, 
we'll release this in scalapack-1.7.4 after 1-2 days.

-julien


On Thu, 4 May 2006, Julien Langou wrote:


Hello Desheng,

the patch is in revision mode. After tested it on 64-bit platform, here is a 
better revision:

              LIWMIN = NUMROC( DESCA( M_ ) + DESCA( MB_ ) * NPROW
    $                  + MOD ( IA - 1, DESCA( MB_ ) ), DESCA ( NB_ ),
    $                  MYCOL, DESCA( CSRC_ ), NPCOL ) +
    $                  MAX ( DESCA( MB_ ) * ICEIL ( ICEIL(
    $                  NUMROC( DESCA( M_ ) + DESCA( MB_) * NPROW,
    $                  DESCA( MB_ ), MYROW, DESCA( RSRC_ ), NPROW ),
    $                  DESCA( MB_ ) ), LCM / NPROW ), DESCA( NB_) )

and the pdgetri.f with the modification is attached.

You can use it right now and this should replace my previous email.

Still testing this. Please let me know if that works on your side.

Julien.

On Thu, 4 May 2006, Julien Langou wrote:


Hello Desheng,

yes there seems to have a problem with PxGETRI with rectangular grids when 
one uses workspace query to get the optimal size of IWORK. The value 
returned in IWORK(1) after a workspace query (LWORK=-1) is too small. The 
documentation gives the same workspace size as the one returned, so the 
documentation of the routine seems to be incorrect as well.

Here is a temporary fix. It will need more testing to be validated but this 
works fine on one of our cluster. Let me know if this works as well for 
you.

Replace Line 221-222 of pdgetri.f -               LIWMIN = NQ + MAX( ICEIL( 
ICEIL( MP, DESCA( MB_ ) ),
-     $                            LCM / NPROW ), DESCA( NB_ ) )
with the following seven lines
+               LIWMIN = NUMROC( DESCA( M_ ) + DESCA( MB_ ) * NPROW
+     $                  + MOD ( IA - 1, MB_P ), DESCA ( NB_ ), MYCOL,
+     $                  DESCA( CSRC_ ), NPCOL ) + MAX ( DESCA( MB_ )
+     $                  * ICEIL ( ICEIL( NUMROC( DESCA( M_ )
+     $                  + DESCA( MB_ ) * NPROW, DESCA( MB_ ), MYROW,
+     $                  DESCA( RSRC_ ), NPROW ), DESCA( MB_ ) ),
+     $                  LCM / NPROW ), DESCA( NB_ ) )

I have tested matrices of size n=[550], with nb=[13 32] for all the grids 
possible with less than 16 processors ( 1-1, 1-2, 2-1, 1-3, 3-1, 1-4, 2-2, 
4-1, 1-5, .... , 16-1), the routine PDGETRI with this modification works 
fine, whereas the previous one was only working on square matrices if one 
was using the minimum LIWORK. You were right.

So, temporary fix, you compile pdgetri.f with your fortran compiler, link 
with scalapack but put pdgetri.o before the libscalapack.a library and this 
should work.

We'll take a bit of time here to have a second look and certainly release a 
more official patch soon.

Feel free for question, remarks, comments.

Julien



-------------- next part --------------
      SUBROUTINE PDGETRI( N, A, IA, JA, DESCA, IPIV, WORK, LWORK,
     $                    IWORK, LIWORK, INFO )
*
*  -- ScaLAPACK routine (version 1.7.4) --
*     University of Tennessee, Knoxville, Oak Ridge National Laboratory,
*     and University of California, Berkeley.
*     v1.7.4: May 10, 2006 
*     v1.7:   May 1,  1997
*
*     .. Scalar Arguments ..
      INTEGER            IA, INFO, JA, LIWORK, LWORK, N
*     ..
*     .. Array Arguments ..
      INTEGER            DESCA( * ), IPIV( * ), IWORK( * )
      DOUBLE PRECISION   A( * ), WORK( * )
*     ..
*
*  Purpose
*  =======
*
*  PDGETRI computes the inverse of a distributed matrix using the LU
*  factorization computed by PDGETRF. This method inverts U and then
*  computes the inverse of sub( A ) = A(IA:IA+N-1,JA:JA+N-1) denoted
*  InvA by solving the system InvA*L = inv(U) for InvA.
*
*  Notes
*  =====
*
*  Each global data object is described by an associated description
*  vector.  This vector stores the information required to establish
*  the mapping between an object element and its corresponding process
*  and memory location.
*
*  Let A be a generic term for any 2D block cyclicly distributed array.
*  Such a global array has an associated description vector DESCA.
*  In the following comments, the character _ should be read as
*  "of the global array".
*
*  NOTATION        STORED IN      EXPLANATION
*  --------------- -------------- --------------------------------------
*  DTYPE_A(global) DESCA( DTYPE_ )The descriptor type.  In this case,
*                                 DTYPE_A = 1.
*  CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating
*                                 the BLACS process grid A is distribu-
*                                 ted over. The context itself is glo-
*                                 bal, but the handle (the integer
*                                 value) may vary.
*  M_A    (global) DESCA( M_ )    The number of rows in the global
*                                 array A.
*  N_A    (global) DESCA( N_ )    The number of columns in the global
*                                 array A.
*  MB_A   (global) DESCA( MB_ )   The blocking factor used to distribute
*                                 the rows of the array.
*  NB_A   (global) DESCA( NB_ )   The blocking factor used to distribute
*                                 the columns of the array.
*  RSRC_A (global) DESCA( RSRC_ ) The process row over which the first
*                                 row of the array A is distributed.
*  CSRC_A (global) DESCA( CSRC_ ) The process column over which the
*                                 first column of the array A is
*                                 distributed.
*  LLD_A  (local)  DESCA( LLD_ )  The leading dimension of the local
*                                 array.  LLD_A >= MAX(1,LOCr(M_A)).
*
*  Let K be the number of rows or columns of a distributed matrix,
*  and assume that its process grid has dimension p x q.
*  LOCr( K ) denotes the number of elements of K that a process
*  would receive if K were distributed over the p processes of its
*  process column.
*  Similarly, LOCc( K ) denotes the number of elements of K that a
*  process would receive if K were distributed over the q processes of
*  its process row.
*  The values of LOCr() and LOCc() may be determined via a call to the
*  ScaLAPACK tool function, NUMROC:
*          LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ),
*          LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ).
*  An upper bound for these quantities may be computed by:
*          LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A
*          LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A
*
*  Arguments
*  =========
*
*  N       (global input) INTEGER
*          The number of rows and columns to be operated on, i.e. the
*          order of the distributed submatrix sub( A ). N >= 0.
*
*  A       (local input/local output) DOUBLE PRECISION pointer into the
*          local memory to an array of dimension (LLD_A,LOCc(JA+N-1)).
*          On entry, the local pieces of the L and U obtained by the
*          factorization sub( A ) = P*L*U computed by PDGETRF. On
*          exit, if INFO = 0, sub( A ) contains the inverse of the
*          original distributed matrix sub( A ).
*
*  IA      (global input) INTEGER
*          The row index in the global array A indicating the first
*          row of sub( A ).
*
*  JA      (global input) INTEGER
*          The column index in the global array A indicating the
*          first column of sub( A ).
*
*  DESCA   (global and local input) INTEGER array of dimension DLEN_.
*          The array descriptor for the distributed matrix A.
*
*  IPIV    (local input) INTEGER array, dimension LOCr(M_A)+MB_A
*          keeps track of the pivoting information. IPIV(i) is the
*          global row index the local row i was swapped with.  This
*          array is tied to the distributed matrix A.
*
*  WORK    (local workspace/local output) DOUBLE PRECISION array,
*                                                     dimension (LWORK)
*          On exit, WORK(1) returns the minimal and optimal LWORK.
*
*  LWORK   (local or global input) INTEGER
*          The dimension of the array WORK.
*          LWORK is local input and must be at least
*          LWORK = LOCr(N+MOD(IA-1,MB_A))*NB_A. WORK is used to keep a
*          copy of at most an entire column block of sub( A ).
*
*          If LWORK = -1, then LWORK is global input and a workspace
*          query is assumed; the routine only calculates the minimum
*          and optimal size for all work arrays. Each of these
*          values is returned in the first entry of the corresponding
*          work array, and no error message is issued by PXERBLA.
*
*  IWORK   (local workspace/local output) INTEGER array,
*                                                    dimension (LIWORK)
*          On exit, IWORK(1) returns the minimal and optimal LIWORK.
*
*  LIWORK  (local or global input) INTEGER
*          The dimension of the array IWORK used as workspace for
*          physically transposing the pivots.
*          LIWORK is local input and must be at least
*          if NPROW == NPCOL then
*            LIWORK = LOCc( N_A + MOD(JA-1, NB_A) ) + NB_A,
*          else
*            LIWORK =  LOCc( N_A + MOD(JA-1, NB_A) ) +
*                      MAX( CEIL(CEIL(LOCr(M_A)/MB_A)/(LCM/NPROW)),
*                           NB_A )
*              where LCM is the least common multiple of process
*              rows and columns (NPROW and NPCOL).
*          end if
*
*          If LIWORK = -1, then LIWORK is global input and a workspace
*          query is assumed; the routine only calculates the minimum
*          and optimal size for all work arrays. Each of these
*          values is returned in the first entry of the corresponding
*          work array, and no error message is issued by PXERBLA.
*
*  INFO    (global output) INTEGER
*          = 0:  successful exit
*          < 0:  If the i-th argument is an array and the j-entry had
*                an illegal value, then INFO = -(i*100+j), if the i-th
*                argument is a scalar and had an illegal value, then
*                INFO = -i.
*            > 0:  If INFO = K, U(IA+K-1,IA+K-1) is exactly zero; the
*                  matrix is singular and its inverse could not be
*                  computed.
*
*  =====================================================================
*
*     .. Parameters ..
      INTEGER            BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_,
     $                   LLD_, MB_, M_, NB_, N_, RSRC_
      PARAMETER          ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DTYPE_ = 1,
     $                     CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6,
     $                     RSRC_ = 7, CSRC_ = 8, LLD_ = 9 )
      DOUBLE PRECISION   ZERO, ONE
      PARAMETER          ( ZERO = 0.0D+0, ONE = 1.0D+0 )
*     ..
*     .. Local Scalars ..
      LOGICAL            LQUERY
      INTEGER            I, IACOL, IAROW, ICOFF, ICTXT, IROFF, IW, J,
     $                   JB, JN, LCM, LIWMIN, LWMIN, MP, MYCOL, MYROW,
     $                   NN, NP, NPCOL, NPROW, NQ
*     ..
*     .. Local Arrays ..
      INTEGER            DESCW( DLEN_ ), IDUM1( 2 ), IDUM2( 2 )
*     ..
*     .. External Subroutines ..
      EXTERNAL           BLACS_GRIDINFO, CHK1MAT, DESCSET, PCHK1MAT,
     $                   PDGEMM, PDLACPY, PDLASET, PDLAPIV,
     $                   PDTRSM, PDTRTRI, PXERBLA
*     ..
*     .. External Functions ..
      INTEGER            ICEIL, ILCM, INDXG2P, NUMROC
      EXTERNAL           ICEIL, ILCM, INDXG2P, NUMROC
*     ..
*     .. Intrinsic Functions ..
      INTRINSIC          DBLE, MAX, MIN, MOD
*     ..
*     .. Executable Statements ..
*
*     Get grid parameters
*
      ICTXT = DESCA( CTXT_ )
      CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL )
*
*     Test the input parameters
*
      INFO = 0
      IF( NPROW.EQ.-1 ) THEN
         INFO = -(500+CTXT_)
      ELSE
         CALL CHK1MAT( N, 1, N, 1, IA, JA, DESCA, 5, INFO )
         IF( INFO.EQ.0 ) THEN
            IROFF = MOD( IA-1, DESCA( MB_ ) )
            ICOFF = MOD( JA-1, DESCA( NB_ ) )
            IAROW = INDXG2P( IA, DESCA( MB_ ), MYROW, DESCA( RSRC_ ),
     $                       NPROW )
            NP = NUMROC( N+IROFF, DESCA( MB_ ), MYROW, IAROW, NPROW )
            LWMIN = NP * DESCA( NB_ )
*
            MP = NUMROC( DESCA( M_ ), DESCA( MB_ ), MYROW,
     $                   DESCA( RSRC_ ), NPROW )
            NQ = NUMROC( DESCA( N_ ), DESCA( NB_ ), MYCOL,
     $                   DESCA( CSRC_ ), NPCOL )
            IF( NPROW.EQ.NPCOL ) THEN
               LIWMIN = NQ + DESCA( NB_ )
            ELSE
*
* Use the formula for the workspace given in PxLAPIV
* to compute the minimum size LIWORK for IWORK
*
* The formula in PxLAPIV is
*   LDW = LOCc( M_P + MOD(IP-1, MB_P) ) +
*         MB_P * CEIL( CEIL(LOCr(M_P)/MB_P) / (LCM/NPROW) )
*
* where 
*   M_P     is the global length of the pivot vector
*           MP = DESCA( M_ ) + DESCA( MB_ ) * NPROW
*   I_P     is IA
*           I_P = IA
*   MB_P    is the block size use for the block cyclic distribution of the 
*           pivot vector
*           MB_P = DESCA (MB_ )
*   LOCc ( . ) 
*           NUMROC ( . , DESCA ( NB_ ), MYCOL, DESCA ( CSRC_ ), NPCOL )
*   LOCr ( . )
*           NUMROC ( . , DESCA ( MB_ ), MYROW, DESCA ( RSRC_ ), NPROW )
*   CEIL ( X / Y )
*           ICEIL( X, Y )
*   LCM 
*           LCM = ILCM( NPROW, NPCOL )
*
               LCM = ILCM( NPROW, NPCOL )
               LIWMIN = NUMROC( DESCA( M_ ) + DESCA( MB_ ) * NPROW
     $                  + MOD ( IA - 1, DESCA( MB_ ) ), DESCA ( NB_ ),
     $                  MYCOL, DESCA( CSRC_ ), NPCOL ) +
     $                  MAX ( DESCA( MB_ ) * ICEIL ( ICEIL(
     $                  NUMROC( DESCA( M_ ) + DESCA( MB_ ) * NPROW,
     $                  DESCA( MB_ ), MYROW, DESCA( RSRC_ ), NPROW ),
     $                  DESCA( MB_ ) ), LCM / NPROW ), DESCA( NB_ ) )
*
            END IF
*
            WORK( 1 ) = DBLE( LWMIN )
            IWORK( 1 ) = LIWMIN
            LQUERY = ( LWORK.EQ.-1 .OR. LIWORK.EQ.-1 )
            IF( IROFF.NE.ICOFF .OR. IROFF.NE.0 ) THEN
               INFO = -4
            ELSE IF( DESCA( MB_ ).NE.DESCA( NB_ ) ) THEN
               INFO = -(500+NB_)
            ELSE IF( LWORK.LT.LWMIN .AND. .NOT.LQUERY ) THEN
               INFO = -8
            ELSE IF( LIWORK.LT.LIWMIN .AND. .NOT.LQUERY ) THEN
               INFO = -10
            END IF
         END IF
         IF( LWORK.EQ.-1 ) THEN
            IDUM1( 1 ) = -1
         ELSE
            IDUM1( 1 ) = 1
         END IF
         IDUM2( 1 ) = 8
         IF( LIWORK.EQ.-1 ) THEN
            IDUM1( 2 ) = -1
         ELSE
            IDUM1( 2 ) = 1
         END IF
         IDUM2( 2 ) = 10
         CALL PCHK1MAT( N, 1, N, 1, IA, JA, DESCA, 5, 2, IDUM1, IDUM2,
     $                  INFO )
      END IF
*
      IF( INFO.NE.0 ) THEN
         CALL PXERBLA( ICTXT, 'PDGETRI', -INFO )
         RETURN
      ELSE IF( LQUERY ) THEN
         RETURN
      END IF
*
*     Quick return if possible
*
      IF( N.EQ.0 )
     $   RETURN
*
*     Form inv(U).  If INFO > 0 from PDTRTRI, then U is singular,
*     and the inverse is not computed.
*
      CALL PDTRTRI( 'Upper', 'Non-unit', N, A, IA, JA, DESCA, INFO )
      IF( INFO.GT.0 )
     $   RETURN
*
*     Define array descriptor for working array WORK
*
      JN = MIN( ICEIL( JA, DESCA( NB_ ) ) * DESCA( NB_ ), JA+N-1 )
      NN = ( ( JA+N-2 ) / DESCA( NB_ ) ) * DESCA( NB_ ) + 1
      IACOL = INDXG2P( NN, DESCA( NB_ ), MYCOL, DESCA( CSRC_ ), NPCOL )
      CALL DESCSET( DESCW, N+IROFF, DESCA( NB_ ), DESCA( MB_ ),
     $              DESCA( NB_ ), IAROW, IACOL, ICTXT, MAX( 1, NP ) )
      IW = IROFF + 1
*
*     Solve the equation inv(A)*L=inv(U) for inv(A) using blocked code.
*
      DO 10 J = NN, JN+1, -DESCA( NB_ )
         JB = MIN( DESCA( NB_ ), JA+N-J )
         I = IA + J - JA
*
*        Copy current block column of L to WORK and replace with zeros.
*
         CALL PDLACPY( 'Lower', JA+N-1-J, JB, A, I+1, J, DESCA,
     $                 WORK, IW+J-JA+1, 1, DESCW )
         CALL PDLASET( 'Lower', JA+N-1-J, JB, ZERO, ZERO, A, I+1, J,
     $                 DESCA )
*
*        Compute current block column of inv(A).
*
         IF( J+JB.LE.JA+N-1 )
     $      CALL PDGEMM( 'No transpose', 'No transpose', N, JB,
     $                   JA+N-J-JB, -ONE, A, IA, J+JB, DESCA, WORK,
     $                   IW+J+JB-JA, 1, DESCW, ONE, A, IA, J, DESCA )
         CALL PDTRSM( 'Right', 'Lower', 'No transpose', 'Unit', N, JB,
     $                ONE, WORK, IW+J-JA, 1, DESCW, A, IA, J, DESCA )
         DESCW( CSRC_ ) = MOD( DESCW( CSRC_ ) + NPCOL - 1, NPCOL )
*
   10 CONTINUE
*
*     Handle the last block of columns separately
*
      JB = JN-JA+1
*
*     Copy current block column of L to WORK and replace with zeros.
*
      CALL PDLACPY( 'Lower', N-1, JB, A, IA+1, JA, DESCA, WORK, IW+1,
     $              1, DESCW )
      CALL PDLASET( 'Lower', N-1, JB, ZERO, ZERO, A, IA+1, JA, DESCA )
*
*     Compute current block column of inv(A).
*
      IF( JA+JB.LE.JA+N-1 )
     $   CALL PDGEMM( 'No transpose', 'No transpose', N, JB,
     $                N-JB, -ONE, A, IA, JA+JB, DESCA, WORK, IW+JB, 1,
     $                DESCW, ONE, A, IA, JA, DESCA )
      CALL PDTRSM( 'Right', 'Lower', 'No transpose', 'Unit', N, JB,
     $             ONE, WORK, IW, 1, DESCW, A, IA, JA, DESCA )
*
*     Use the row pivots and apply them to the columns of the global
*     matrix.
*
*
* JL: I do not get why the size of the PIVOT vector is DESCA( M_ ) + DESCA( MB_ 
)*NPROW
* should be DESCA( M_ ) + DESCA( MB_ ) no?
*
      CALL DESCSET( DESCW, DESCA( M_ ) + DESCA( MB_ )*NPROW, 1,
     $              DESCA( MB_ ), 1, DESCA( RSRC_ ), MYCOL, ICTXT,
     $              MP+DESCA( MB_ ) )
      CALL PDLAPIV( 'Backward', 'Columns', 'Column', N, N, A, IA,
     $              JA, DESCA, IPIV, IA, 1, DESCW, IWORK )
*
      WORK( 1 ) = DBLE( LWMIN )
      IWORK( 1 ) = LIWMIN
*
      RETURN
*
*     End of PDGETRI
*
      END
-------------- next part --------------
      PROGRAM PDINVDRIVER
*
*  -- ScaLAPACK testing driver (version 1.7) --
*     University of Tennessee, Knoxville, Oak Ridge National Laboratory,
*     and University of California, Berkeley.
*     May 1, 1997
*
*  Purpose
*  =======
*
*  PDINVDRIVER is the main test program for the DOUBLE PRECISION
*  SCALAPACK matrix inversion routines.  This test driver computes the
*  inverse of different kind of matrix and tests the results.
*
*  The program must be driven by a short data file. An annotated example
*  of a data file can be obtained by deleting the first 3 characters
*  from the following 14 lines:
*  'ScaLAPACK Matrix Inversion Testing input file'
*  'PVM machine.'
*  'INV.out'                   output file name (if any)
*  6                           device out
*  5                           number of matrix types (next line)
*  'GEN' 'UTR' 'LTR' 'UPD' LPD' GEN, UTR, LTR, UPD, LPD
*  4                           number of problems sizes
*  1000 2000 3000 4000         values of N
*  3                           number of NB's
*  4 30 35                     values of NB
*  2                           number of process grids (ordered P & Q)
*  4 2                         values of P
*  4 4                         values of Q
*  1.0                         threshold
*
*  Internal Parameters
*  ===================
*
*  TOTMEM   INTEGER, default = 2000000
*           TOTMEM is a machine-specific parameter indicating the
*           maximum amount of available memory in bytes.
*           The user should customize TOTMEM to his platform.  Remember
*           to leave room in memory for the operating system, the BLACS
*           buffer, etc.  For example, on a system with 8 MB of memory
*           per process (e.g., one processor on an Intel iPSC/860), the
*           parameters we use are TOTMEM=6200000 (leaving 1.8 MB for OS,
*           code, BLACS buffer, etc).  However, for PVM, we usually set
*           TOTMEM = 2000000.  Some experimenting with the maximum value
*           of TOTMEM may be required.
*
*  INTGSZ   INTEGER, default = 4 bytes.
*  DBLESZ   INTEGER, default = 8 bytes.
*           INTGSZ and DBLESZ indicate the length in bytes on the
*           given platform for an integer and a double precision real.
*  MEM      DOUBLE PRECISION array, dimension ( TOTMEM / DBLESZ )
*
*           All arrays used by SCALAPACK routines are allocated from
*           this array and referenced by pointers.  The integer IPA,
*           for example, is a pointer to the starting element of MEM for
*           the matrix A.
*
*  =====================================================================
*
*     .. Parameters ..
      INTEGER            BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_,
     $                   LLD_, MB_, M_, NB_, N_, RSRC_
      PARAMETER          ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DTYPE_ = 1,
     $                     CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6,
     $                     RSRC_ = 7, CSRC_ = 8, LLD_ = 9 )
      INTEGER            DBLESZ, INTGSZ, MEMSIZ, NTESTS, TOTMEM
      DOUBLE PRECISION   PADVAL, ZERO
      PARAMETER          ( DBLESZ = 8, INTGSZ = 4, TOTMEM = 2000000,
     $                     MEMSIZ = TOTMEM / DBLESZ, NTESTS = 20,
     $                     PADVAL = -9923.0D+0, ZERO = 0.0D+0 )
*     ..
*     .. Local Scalars ..
      CHARACTER          UPLO
      CHARACTER*3        MTYP
      CHARACTER*6        PASSED
      CHARACTER*80       OUTFILE
      LOGICAL            CHECK
      INTEGER            I, IAM, IASEED, ICTXT, IMIDPAD, INFO, IPA,
     $                   IPPIV, IPREPAD, IPOSTPAD, IPIW, IPW, ITEMP, J,
     $                   K, KTESTS, KPASS, KFAIL, KSKIP, L, LCM, LIPIV,
     $                   LIWORK, LWORK, MYCOL, MYROW, N, NB, NGRIDS,
     $                   NMAT, NMTYP, NNB, NOUT, NP, NPCOL, NPROCS,
     $                   NPROW, NQ, WORKIINV, WORKINV, WORKSIZ
      REAL               THRESH
      DOUBLE PRECISION   ANORM, FRESID, NOPS, RCOND, TMFLOPS
*     ..
*     .. Local Arrays ..
      CHARACTER*3        MATTYP( NTESTS )
      INTEGER            DESCA( DLEN_ ), IERR( 1 ), NBVAL( NTESTS ),
     $                   NVAL( NTESTS ), PVAL( NTESTS ),
     $                   QVAL( NTESTS )
      DOUBLE PRECISION   MEM( MEMSIZ ), CTIME( 2 ), WTIME( 2 )
*     ..
*     .. External Subroutines ..
      EXTERNAL           BLACS_BARRIER, BLACS_EXIT, BLACS_GET,
     $                   BLACS_GRIDEXIT, BLACS_GRIDINFO, BLACS_GRIDINIT,
     $                   BLACS_PINFO, DESCINIT, IGSUM2D, PDCHEKPAD,
     $                   PDFILLPAD, PDGETRF, PDGETRI,
     $                   PDINVCHK, PDINVINFO, PDLASET,
     $                   PDMATGEN, PDPOTRF, PDPOTRI,
     $                   PDTRTRI, SLBOOT, SLCOMBINE, SLTIMER
*     ..
*     .. External Functions ..
      LOGICAL            LSAMEN
      INTEGER            ICEIL, ILCM, NUMROC
      DOUBLE PRECISION   PDLANGE, PDLANSY, PDLANTR
      EXTERNAL           ICEIL, ILCM, LSAMEN, NUMROC, PDLANGE,
     $                   PDLANSY, PDLANTR
*     ..
*     .. Intrinsic Functions ..
      INTRINSIC          DBLE, MAX, MOD
*     ..
*     .. Data Statements ..
      DATA               KTESTS, KPASS, KFAIL, KSKIP /4*0/
*     ..
*     .. Executable Statements ..
*
*     Get starting information
*
      CALL BLACS_PINFO( IAM, NPROCS )
      IASEED = 100
      CALL PDINVINFO( OUTFILE, NOUT, NMTYP, MATTYP, NTESTS, NMAT, NVAL,
     $                NTESTS, NNB, NBVAL, NTESTS, NGRIDS, PVAL, NTESTS,
     $                QVAL, NTESTS, THRESH, MEM, IAM, NPROCS )
      CHECK = ( THRESH.GE.0.0E+0 )
*
*     Loop over the different matrix types
*
      DO 40 I = 1, NMTYP
*
         MTYP = MATTYP( I )
*
*        Print headings
*
         IF( IAM.EQ.0 ) THEN
            WRITE( NOUT, FMT = * )
            IF( LSAMEN( 3, MTYP, 'GEN' ) ) THEN
               WRITE( NOUT, FMT = 9986 )
     $                'A is a general matrix.'
            ELSE IF( LSAMEN( 3, MTYP, 'UTR' ) ) THEN
               WRITE( NOUT, FMT = 9986 )
     $               'A is an upper triangular matrix.'
            ELSE IF( LSAMEN( 3, MTYP, 'LTR' ) ) THEN
               WRITE( NOUT, FMT = 9986 )
     $               'A is a lower triangular matrix.'
            ELSE IF( LSAMEN( 3, MTYP, 'UPD' ) ) THEN
               WRITE( NOUT, FMT = 9986 )
     $               'A is a symmetric positive definite matrix.'
               WRITE( NOUT, FMT = 9986 )
     $               'Only the upper triangular part will be '//
     $               'referenced.'
            ELSE IF( LSAMEN( 3, MTYP, 'LPD' ) ) THEN
               WRITE( NOUT, FMT = 9986 )
     $               'A is a symmetric positive definite matrix.'
               WRITE( NOUT, FMT = 9986 )
     $               'Only the lower triangular part will be '//
     $               'referenced.'
            END IF
            WRITE( NOUT, FMT = * )
            WRITE( NOUT, FMT = 9995 )
            WRITE( NOUT, FMT = 9994 )
            WRITE( NOUT, FMT = * )
         END IF
*
*        Loop over different process grids
*
         DO 30 J = 1, NGRIDS
*
            NPROW = PVAL( J )
            NPCOL = QVAL( J )
*
*           Make sure grid information is correct
*
            IERR( 1 ) = 0
            IF( NPROW.LT.1 ) THEN
               IF( IAM.EQ.0 )
     $            WRITE( NOUT, FMT = 9999 ) 'GRID', 'nprow', NPROW
               IERR( 1 ) = 1
            ELSE IF( NPCOL.LT.1 ) THEN
               IF( IAM.EQ.0 )
     $            WRITE( NOUT, FMT = 9999 ) 'GRID', 'npcol', NPCOL
               IERR( 1 ) = 1
            ELSE IF( NPROW*NPCOL.GT.NPROCS ) THEN
               IF( IAM.EQ.0 )
     $            WRITE( NOUT, FMT = 9998 ) NPROW*NPCOL, NPROCS
               IERR( 1 ) = 1
            END IF
*
            IF( IERR( 1 ).GT.0 ) THEN
               IF( IAM.EQ.0 )
     $            WRITE( NOUT, FMT = 9997 ) 'grid'
               KSKIP = KSKIP + 1
               GO TO 30
            END IF
*
*           Define process grid
*
            CALL BLACS_GET( -1, 0, ICTXT )
            CALL BLACS_GRIDINIT( ICTXT, 'Row-major', NPROW, NPCOL )
            CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL )
*
*           Go to bottom of loop if this case doesn't use my process
*
            IF( MYROW.GE.NPROW .OR. MYCOL.GE.NPCOL )
     $         GO TO 30
*
            DO 20 K = 1, NMAT
*
               N = NVAL( K )
*
*              Make sure matrix information is correct
*
               IERR( 1 ) = 0
               IF( N.LT.1 ) THEN
                  IF( IAM.EQ.0 )
     $               WRITE( NOUT, FMT = 9999 ) 'MATRIX', 'N', N
                  IERR( 1 ) = 1
               END IF
*
*              Make sure no one had error
*
               CALL IGSUM2D( ICTXT, 'All', ' ', 1, 1, IERR, 1, -1, 0 )
*
               IF( IERR( 1 ).GT.0 ) THEN
                  IF( IAM.EQ.0 )
     $               WRITE( NOUT, FMT = 9997 ) 'matrix'
                  KSKIP = KSKIP + 1
                  GO TO 20
               END IF
*
*              Loop over different blocking sizes
*
               DO 10 L = 1, NNB
*
                  NB = NBVAL( L )
*
*                 Make sure nb is legal
*
                  IERR( 1 ) = 0
                  IF( NB.LT.1 ) THEN
                     IERR( 1 ) = 1
                     IF( IAM.EQ.0 )
     $                  WRITE( NOUT, FMT = 9999 ) 'NB', 'NB', NB
                  END IF
*
*                 Check all processes for an error
*
                  CALL IGSUM2D( ICTXT, 'All', ' ', 1, 1, IERR, 1, -1,
     $                          0 )
*
                  IF( IERR( 1 ).GT.0 ) THEN
                     IF( IAM.EQ.0 )
     $                  WRITE( NOUT, FMT = 9997 ) 'NB'
                     KSKIP = KSKIP + 1
                     GO TO 10
                  END IF
*
*                 Padding constants
*
                  NP = NUMROC( N, NB, MYROW, 0, NPROW )
                  NQ = NUMROC( N, NB, MYCOL, 0, NPCOL )
                  IF( CHECK ) THEN
                     IPREPAD  = MAX( NB, NP )
                     IMIDPAD  = NB
                     IPOSTPAD = MAX( NB, NQ )
                  ELSE
                     IPREPAD  = 0
                     IMIDPAD  = 0
                     IPOSTPAD = 0
                  END IF
*
*                 Initialize the array descriptor for the matrix A
*
                  CALL DESCINIT( DESCA, N, N, NB, NB, 0, 0, ICTXT,
     $                           MAX( 1, NP ) + IMIDPAD, IERR( 1 ) )
*
*                 Check all processes for an error
*
                  CALL IGSUM2D( ICTXT, 'All', ' ', 1, 1, IERR, 1, -1,
     $                          0 )
*
                  IF( IERR( 1 ).LT.0 ) THEN
                     IF( IAM.EQ.0 )
     $                  WRITE( NOUT, FMT = 9997 ) 'descriptor'
                     KSKIP = KSKIP + 1
                     GO TO 10
                  END IF
*
*                 Assign pointers into MEM for ScaLAPACK arrays, A is
*                 allocated starting at position MEM( IPREPAD+1 )
*
                  IPA = IPREPAD+1
*
                  LCM = ILCM( NPROW, NPCOL )
                  IF( LSAMEN( 3, MTYP, 'GEN' ) ) THEN
*
*                    Pivots are needed by LU factorization
*
                     IPPIV = IPA + DESCA( LLD_ ) * NQ + IPOSTPAD +
     $                       IPREPAD
                     LIPIV = ICEIL( INTGSZ * ( NP + NB ), DBLESZ )
                     IPW = IPPIV + LIPIV + IPOSTPAD + IPREPAD
*
                     LWORK = MAX( 1, NP * DESCA( NB_ ) )
                     WORKINV = LWORK + IPOSTPAD
*
*                    Figure the amount of workspace required by the
*                    general matrix inversion
*
                     IF( NPROW.EQ.NPCOL ) THEN
                        LIWORK = NQ + DESCA( NB_ )
                     ELSE
*
* change the integer workspace needed for PDGETRI
*                        LIWORK = MAX( DESCA( NB_ ), DESCA( MB_ ) *
*     $                           ICEIL( ICEIL( DESCA( LLD_ ),
*     $                                  DESCA( MB_ ) ), LCM / NPROW ) )
*     $                                  + NQ
                         LIWORK = NUMROC( DESCA( M_ ) +
     $                  DESCA( MB_ ) * NPROW
     $                  + MOD ( 1 - 1, DESCA( MB_ ) ), DESCA ( NB_ ),
     $                  MYCOL, DESCA( CSRC_ ), NPCOL ) +
     $                  MAX ( DESCA( MB_ ) * ICEIL ( ICEIL(
     $                  NUMROC( DESCA( M_ ) + DESCA( MB_ ) * NPROW,
     $                  DESCA( MB_ ), MYROW, DESCA( RSRC_ ), NPROW ),
     $                  DESCA( MB_ ) ), LCM / NPROW ), DESCA( NB_ ) )
*
                     END IF
                     WORKIINV = ICEIL( LIWORK*INTGSZ, DBLESZ ) +
     $                          IPOSTPAD
                     IPIW = IPW + WORKINV + IPREPAD
                     WORKSIZ = WORKINV + IPREPAD + WORKIINV
*
                  ELSE
*
*                    No pivots or workspace needed for triangular or
*                    symmetric positive definite matrices.
*
                     IPW = IPA + DESCA( LLD_ ) * NQ + IPOSTPAD + IPREPAD
                     WORKSIZ = 1 + IPOSTPAD
*
                  END IF
*
                  IF( CHECK ) THEN
*
*                    Figure amount of work space for the norm
*                    computations
*
                     IF( LSAMEN( 3, MTYP, 'GEN'       ).OR.
     $                   LSAMEN( 2, MTYP( 2:3 ), 'TR' ) ) THEN
                        ITEMP = NQ
                     ELSE
                        ITEMP = 2 * NQ + NP
                        IF( NPROW.NE.NPCOL ) THEN
                           ITEMP = ITEMP +
     $                             NB * ICEIL( ICEIL( NP, NB ),
     $                                         LCM / NPROW )
                        END IF
                     END IF
                     WORKSIZ = MAX( WORKSIZ-IPOSTPAD, ITEMP )
*
*                    Figure the amount of workspace required by the
*                    checking routine
*
                     WORKSIZ = MAX( WORKSIZ, 2 * NB * MAX( 1, NP ) ) +
     $                         IPOSTPAD
*
                  END IF
*
*                 Check for adequate memory for problem size
*
                  IERR( 1 ) = 0
                  IF( IPW+WORKSIZ.GT.MEMSIZ ) THEN
                     IF( IAM.EQ.0 )
     $                  WRITE( NOUT, FMT = 9996 ) 'inversion',
     $                         ( IPW + WORKSIZ ) * DBLESZ
                     IERR( 1 ) = 1
                  END IF
*
*                 Check all processes for an error
*
                  CALL IGSUM2D( ICTXT, 'All', ' ', 1, 1, IERR, 1, -1,
     $                          0 )
*
                  IF( IERR( 1 ).GT.0 ) THEN
                     IF( IAM.EQ.0 )
     $                  WRITE( NOUT, FMT = 9997 ) 'MEMORY'
                     KSKIP = KSKIP + 1
                     GO TO 10
                  END IF
*
                  IF( LSAMEN( 3, MTYP, 'GEN'       ).OR.
     $                LSAMEN( 2, MTYP( 2:3 ), 'TR' ) ) THEN
*
*                    Generate a general diagonally dominant matrix A
*
                     CALL PDMATGEN( ICTXT, 'N', 'D', DESCA( M_ ),
     $                              DESCA( N_ ), DESCA( MB_ ),
     $                              DESCA( NB_ ), MEM( IPA ),
     $                              DESCA( LLD_ ), DESCA( RSRC_ ),
     $                              DESCA( CSRC_ ), IASEED, 0, NP, 0,
     $                              NQ, MYROW, MYCOL, NPROW, NPCOL )
*
                  ELSE IF( LSAMEN( 2, MTYP( 2:3 ), 'PD' ) ) THEN
*
*                    Generate a symmetric positive definite matrix
*
                     CALL PDMATGEN( ICTXT, 'S', 'D', DESCA( M_ ),
     $                              DESCA( N_ ), DESCA( MB_ ),
     $                              DESCA( NB_ ), MEM( IPA ),
     $                              DESCA( LLD_ ), DESCA( RSRC_ ),
     $                              DESCA( CSRC_ ), IASEED, 0, NP, 0,
     $                              NQ, MYROW, MYCOL, NPROW, NPCOL )
*
                  END IF
*
*                 Zeros not-referenced part of A, if any.
*
                  IF( LSAMEN( 1, MTYP, 'U' ) ) THEN
*
                     UPLO = 'U'
                     CALL PDLASET( 'Lower', N-1, N-1, ZERO, ZERO,
     $                             MEM( IPA ), 2, 1, DESCA )
*
                  ELSE IF( LSAMEN( 1, MTYP, 'L' ) ) THEN
*
                     UPLO = 'L'
                     CALL PDLASET( 'Upper', N-1, N-1, ZERO, ZERO,
     $                             MEM( IPA ), 1, 2, DESCA )
*
                  ELSE
*
                     UPLO = 'G'
*
                  END IF
*
*                 Need 1-norm of A for checking
*
                  IF( CHECK ) THEN
*
                     CALL PDFILLPAD( ICTXT, NP, NQ, MEM( IPA-IPREPAD ),
     $                               DESCA( LLD_ ), IPREPAD, IPOSTPAD,
     $                               PADVAL )
                     CALL PDFILLPAD( ICTXT, WORKSIZ-IPOSTPAD, 1,
     $                               MEM( IPW-IPREPAD ),
     $                               WORKSIZ-IPOSTPAD, IPREPAD,
     $                               IPOSTPAD, PADVAL )
*
                     IF( LSAMEN( 3, MTYP, 'GEN' ) ) THEN
*
                        CALL PDFILLPAD( ICTXT, LIPIV, 1,
     $                                 MEM( IPPIV-IPREPAD ), LIPIV,
     $                                 IPREPAD, IPOSTPAD, PADVAL )
                        ANORM = PDLANGE( '1', N, N, MEM( IPA ), 1, 1,
     $                                   DESCA, MEM( IPW ) )
                        CALL PDCHEKPAD( ICTXT, 'PDLANGE', NP, NQ,
     $                                  MEM( IPA-IPREPAD ),
     $                                  DESCA( LLD_ ),
     $                                  IPREPAD, IPOSTPAD, PADVAL )
                        CALL PDCHEKPAD( ICTXT, 'PDLANGE',
     $                                  WORKSIZ-IPOSTPAD, 1,
     $                                  MEM( IPW-IPREPAD ),
     $                                  WORKSIZ-IPOSTPAD,
     $                                  IPREPAD, IPOSTPAD, PADVAL )
                        CALL PDFILLPAD( ICTXT, WORKINV-IPOSTPAD, 1,
     $                                  MEM( IPW-IPREPAD ),
     $                                  WORKINV-IPOSTPAD,
     $                                  IPREPAD, IPOSTPAD, PADVAL )
                        CALL PDFILLPAD( ICTXT, WORKIINV-IPOSTPAD, 1,
     $                                 MEM( IPIW-IPREPAD ),
     $                                 WORKIINV-IPOSTPAD, IPREPAD,
     $                                 IPOSTPAD, PADVAL )
                     ELSE IF( LSAMEN( 2, MTYP( 2:3 ), 'TR' ) ) THEN
*
                        ANORM = PDLANTR( '1', UPLO, 'Non unit', N, N,
     $                                   MEM( IPA ), 1, 1, DESCA,
     $                                   MEM( IPW ) )
                        CALL PDCHEKPAD( ICTXT, 'PDLANTR', NP, NQ,
     $                                  MEM( IPA-IPREPAD ),
     $                                  DESCA( LLD_ ),
     $                                  IPREPAD, IPOSTPAD, PADVAL )
                        CALL PDCHEKPAD( ICTXT, 'PDLANTR',
     $                                  WORKSIZ-IPOSTPAD, 1,
     $                                  MEM( IPW-IPREPAD ),
     $                                  WORKSIZ-IPOSTPAD,
     $                                  IPREPAD, IPOSTPAD, PADVAL )
*
                     ELSE IF( LSAMEN( 2, MTYP( 2:3 ), 'PD' ) ) THEN
*
                        ANORM = PDLANSY( '1', UPLO, N, MEM( IPA ), 1, 1,
     $                                   DESCA, MEM( IPW ) )
                        CALL PDCHEKPAD( ICTXT, 'PDLANSY', NP, NQ,
     $                                  MEM( IPA-IPREPAD ),
     $                                  DESCA( LLD_ ),
     $                                  IPREPAD, IPOSTPAD, PADVAL )
                        CALL PDCHEKPAD( ICTXT, 'PDLANSY',
     $                                  WORKSIZ-IPOSTPAD, 1,
     $                                  MEM( IPW-IPREPAD ),
     $                                  WORKSIZ-IPOSTPAD,
     $                                  IPREPAD, IPOSTPAD, PADVAL )
*
                     ELSE IF( LSAMEN( 2, MTYP( 2:3 ), 'SY' ) ) THEN
*
                        CALL PDFILLPAD( ICTXT, LIPIV, 1,
     $                                  MEM( IPPIV-IPREPAD ), LIPIV,
     $                                  IPREPAD, IPOSTPAD, PADVAL )
                        ANORM = PDLANSY( '1', UPLO, N, MEM( IPA ), 1, 1,
     $                                   DESCA, MEM( IPW ) )
                        CALL PDCHEKPAD( ICTXT, 'PDLANSY', NP, NQ,
     $                                  MEM( IPA-IPREPAD ),
     $                                  DESCA( LLD_ ),
     $                                  IPREPAD, IPOSTPAD, PADVAL )
                        CALL PDCHEKPAD( ICTXT, 'PDLANSY',
     $                                  WORKSIZ-IPOSTPAD, 1,
     $                                  MEM( IPW-IPREPAD ),
     $                                  WORKSIZ-IPOSTPAD,
     $                                  IPREPAD,IPOSTPAD, PADVAL )
*
                     END IF
*
                  END IF
*
                  CALL SLBOOT()
                  CALL BLACS_BARRIER( ICTXT, 'All' )
*
                  IF( LSAMEN( 3, MTYP, 'GEN' ) ) THEN
*
*                    Perform LU factorization
*
                     CALL SLTIMER( 1 )
                     CALL PDGETRF( N, N, MEM( IPA ), 1, 1, DESCA,
     $                             MEM( IPPIV ), INFO )
                     CALL SLTIMER( 1 )
*
                     IF( CHECK ) THEN
*
*                       Check for memory overwrite
*
                        CALL PDCHEKPAD( ICTXT, 'PDGETRF', NP, NQ,
     $                                  MEM( IPA-IPREPAD ),
     $                                  DESCA( LLD_ ),
     $                                  IPREPAD, IPOSTPAD, PADVAL )
                        CALL PDCHEKPAD( ICTXT, 'PDGETRF', LIPIV, 1,
     $                                  MEM( IPPIV-IPREPAD ), LIPIV,
     $                                  IPREPAD, IPOSTPAD, PADVAL )
                     END IF
*
*                    Perform the general matrix inversion
*
                     CALL SLTIMER( 2 )
                     CALL PDGETRI( N, MEM( IPA ), 1, 1, DESCA,
     $                             MEM( IPPIV ), MEM( IPW ), LWORK,
     $                             MEM( IPIW ), LIWORK, INFO )
                     CALL SLTIMER( 2 )
*
                     IF( CHECK ) THEN
*
*                       Check for memory overwrite
*
                        CALL PDCHEKPAD( ICTXT, 'PDGETRI', NP, NQ,
     $                                  MEM( IPA-IPREPAD ),
     $                                  DESCA( LLD_ ),
     $                                  IPREPAD, IPOSTPAD, PADVAL )
                        CALL PDCHEKPAD( ICTXT, 'PDGETRI', LIPIV, 1,
     $                                  MEM( IPPIV-IPREPAD ), LIPIV,
     $                                  IPREPAD, IPOSTPAD, PADVAL )
                        CALL PDCHEKPAD( ICTXT, 'PDGETRI',
     $                                  WORKIINV-IPOSTPAD, 1,
     $                                  MEM( IPIW-IPREPAD ),
     $                                  WORKIINV-IPOSTPAD,
     $                                  IPREPAD, IPOSTPAD, PADVAL )
                        CALL PDCHEKPAD( ICTXT, 'PDGETRI',
     $                                  WORKINV-IPOSTPAD, 1,
     $                                  MEM( IPW-IPREPAD ),
     $                                  WORKINV-IPOSTPAD,
     $                                  IPREPAD, IPOSTPAD, PADVAL )
                     END IF
*
                  ELSE IF( LSAMEN( 2, MTYP( 2:3 ), 'TR' ) ) THEN
*
*                    Perform the general matrix inversion
*
                     CALL SLTIMER( 2 )
                     CALL PDTRTRI( UPLO, 'Non unit', N, MEM( IPA ), 1,
     $                             1, DESCA, INFO )
                     CALL SLTIMER( 2 )
*
                     IF( CHECK ) THEN
*
*                       Check for memory overwrite
*
                        CALL PDCHEKPAD( ICTXT, 'PDTRTRI', NP, NQ,
     $                                  MEM( IPA-IPREPAD ),
     $                                  DESCA( LLD_ ),
     $                                  IPREPAD, IPOSTPAD, PADVAL )
                     END IF
*
                  ELSE IF( LSAMEN( 2, MTYP( 2:3 ), 'PD' ) ) THEN
*
*                    Perform Cholesky factorization
*
                     CALL SLTIMER( 1 )
                     CALL PDPOTRF( UPLO, N, MEM( IPA ), 1, 1, DESCA,
     $                             INFO )
                     CALL SLTIMER( 1 )
*
                     IF( CHECK ) THEN
*
*                       Check for memory overwrite
*
                        CALL PDCHEKPAD( ICTXT, 'PDPOTRF', NP, NQ,
     $                                  MEM( IPA-IPREPAD ),
     $                                  DESCA( LLD_ ),
     $                                  IPREPAD, IPOSTPAD, PADVAL )
                     END IF
*
*                    Perform the symmetric positive definite matrix
*                    inversion
*
                     CALL SLTIMER( 2 )
                     CALL PDPOTRI( UPLO, N, MEM( IPA ), 1, 1, DESCA,
     $                             INFO )
                     CALL SLTIMER( 2 )
*
                     IF( CHECK ) THEN
*
*                       Check for memory overwrite
*
                        CALL PDCHEKPAD( ICTXT, 'PDPOTRI', NP, NQ,
     $                                  MEM( IPA-IPREPAD ),
     $                                  DESCA( LLD_ ),
     $                                  IPREPAD, IPOSTPAD, PADVAL )
                     END IF
*
                  END IF
*
                  IF( CHECK ) THEN
*
                     CALL PDFILLPAD( ICTXT, WORKSIZ-IPOSTPAD, 1,
     $                               MEM( IPW-IPREPAD ),
     $                               WORKSIZ-IPOSTPAD, IPREPAD,
     $                               IPOSTPAD, PADVAL )
*
*                    Compute fresid = || inv(A)*A-I ||
*
                     CALL PDINVCHK( MTYP, N, MEM( IPA ), 1, 1, DESCA,
     $                              IASEED, ANORM, FRESID, RCOND,
     $                              MEM( IPW ) )
*
*                    Check for memory overwrite
*
                     CALL PDCHEKPAD( ICTXT, 'PDINVCHK', NP, NQ,
     $                               MEM( IPA-IPREPAD ),
     $                               DESCA( LLD_ ),
     $                               IPREPAD, IPOSTPAD, PADVAL )
                     CALL PDCHEKPAD( ICTXT, 'PDINVCHK',
     $                               WORKSIZ-IPOSTPAD, 1,
     $                               MEM( IPW-IPREPAD ),
     $                               WORKSIZ-IPOSTPAD, IPREPAD,
     $                               IPOSTPAD, PADVAL )
*
*                    Test residual and detect NaN result
*
                     IF( FRESID.LE.THRESH .AND. INFO.EQ.0 .AND.
     $                   ( (FRESID-FRESID) .EQ. 0.0D+0 ) ) THEN
                        KPASS = KPASS + 1
                        PASSED = 'PASSED'
                     ELSE
                        KFAIL = KFAIL + 1
                        IF( INFO.GT.0 ) THEN
                           PASSED = 'SINGUL'
                        ELSE
                           PASSED = 'FAILED'
                        END IF
                     END IF
*
                  ELSE
*
*                    Don't perform the checking, only the timing
*                    operation
*
                     KPASS = KPASS + 1
                     FRESID = FRESID - FRESID
                     PASSED = 'BYPASS'
*
                  END IF
*
*                 Gather maximum of all CPU and WALL clock timings
*
                  CALL SLCOMBINE( ICTXT, 'All', '>', 'W', 2, 1, WTIME )
                  CALL SLCOMBINE( ICTXT, 'All', '>', 'C', 2, 1, CTIME )
*
*                 Print results
*
                  IF( MYROW.EQ.0 .AND. MYCOL.EQ.0  ) THEN
*
                     IF( LSAMEN( 3, MTYP, 'GEN' ) ) THEN
*
*                       2/3 N^3 - 1/2 N^2 flops for LU factorization
*
                        NOPS = ( 2.0D+0 / 3.0D+0 )*( DBLE( N )**3 ) -
     $                         ( 1.0D+0 / 2.0D+0 )*( DBLE( N )**2 )
*
*                       4/3 N^3 - N^2 flops for inversion
*
                        NOPS = NOPS +
     $                         ( 4.0D+0 / 3.0D+0 ) * ( DBLE( N )**3 ) -
     $                         DBLE( N )**2
*
                     ELSE IF( LSAMEN( 2, MTYP( 2:3 ), 'TR' ) ) THEN
*
*                       1/3 N^3 + 2/3 N flops for triangular inversion
*
                        CTIME(1) = 0.0D+0
                        WTIME(1) = 0.0D+0
                        NOPS = ( 1.0D+0 / 3.0D+0 ) * ( DBLE( N )**3 ) +
     $                         ( 2.0D+0 / 3.0D+0 ) * ( DBLE( N ) )
*
                     ELSE IF( LSAMEN( 2, MTYP( 2:3 ), 'PD' ) ) THEN
*
*                       1/3 N^3 + 1/2 N^2 flops for Cholesky
*                       factorization
*
                        NOPS = ( 1.0D+0 / 3.0D+0 ) * ( DBLE( N )**3 ) +
     $                         ( 1.0D+0 / 2.0D+0 ) * ( DBLE( N )**2 )
*
*                       2/3 N^3  + 1/2 N^2 flops for Cholesky inversion
*
                        NOPS = NOPS +
     $                         ( 2.0D+0 / 3.0D+0 ) * ( DBLE( N )**3 ) +
     $                         ( 1.0D+0 / 2.0D+0 ) * ( DBLE( N )**2 )
*
                     END IF
*
*                    Figure total megaflops -- factorization and
*                    inversion, for WALL and CPU time, and print
*                    output.
*
*                    Print WALL time if machine supports it
*
                     IF( WTIME( 1 ) + WTIME( 2 ) .GT. 0.0D+0 ) THEN
                        TMFLOPS = NOPS /
     $                            ( ( WTIME( 1 )+WTIME( 2 ) ) * 1.0D+6 )
                     ELSE
                        TMFLOPS = 0.0D+0
                     END IF
*
                     IF( WTIME( 2 ) .GE. 0.0D+0 )
     $                  WRITE( NOUT, FMT = 9993 ) 'WALL', N, NB, NPROW,
     $                         NPCOL, WTIME( 1 ), WTIME( 2 ), TMFLOPS,
     $                         RCOND, FRESID, PASSED
*
*                    Print CPU time if machine supports it
*
                     IF( CTIME( 1 ) + CTIME( 2 ) .GT. 0.0D+0 ) THEN
                        TMFLOPS = NOPS /
     $                            ( ( CTIME( 1 )+CTIME( 2 ) ) * 1.0D+6 )
                     ELSE
                        TMFLOPS = 0.0D+0
                     END IF
*
                     IF( CTIME( 2 ) .GE. 0.0D+0 )
     $                  WRITE( NOUT, FMT = 9993 ) 'CPU ', N, NB, NPROW,
     $                         NPCOL, CTIME( 1 ), CTIME( 2 ), TMFLOPS,
     $                         RCOND, FRESID, PASSED
                  END IF
*
   10          CONTINUE
*
   20       CONTINUE
*
            CALL BLACS_GRIDEXIT( ICTXT )
*
   30    CONTINUE
*
   40 CONTINUE
*
*     Print out ending messages and close output file
*
      IF( IAM.EQ.0 ) THEN
         KTESTS = KPASS + KFAIL + KSKIP
         WRITE( NOUT, FMT = * )
         WRITE( NOUT, FMT = 9992 ) KTESTS
         IF( CHECK ) THEN
            WRITE( NOUT, FMT = 9991 ) KPASS
            WRITE( NOUT, FMT = 9989 ) KFAIL
         ELSE
            WRITE( NOUT, FMT = 9990 ) KPASS
         END IF
         WRITE( NOUT, FMT = 9988 ) KSKIP
         WRITE( NOUT, FMT = * )
         WRITE( NOUT, FMT = * )
         WRITE( NOUT, FMT = 9987 )
         IF( NOUT.NE.6 .AND. NOUT.NE.0 )
     $      CLOSE ( NOUT )
      END IF
*
      CALL BLACS_EXIT( 0 )
*
 9999 FORMAT( 'ILLEGAL ', A6, ': ', A5, ' = ', I3,
     $        '; It should be at least 1' )
 9998 FORMAT( 'ILLEGAL GRID: nprow*npcol = ', I4, '. It can be at most',
     $        I4 )
 9997 FORMAT( 'Bad ', A6, ' parameters: going on to next test case.' )
 9996 FORMAT( 'Unable to perform ', A, ': need TOTMEM of at least',
     $        I11 )
 9995 FORMAT( 'TIME     N  NB     P     Q Fct Time Inv Time ',
     $        '     MFLOPS    Cond   Resid  CHECK' )
 9994 FORMAT( '---- ----- --- ----- ----- -------- -------- ',
     $        '----------- ------- ------- ------' )
 9993 FORMAT( A4, 1X, I5, 1X, I3, 1X, I5, 1X, I5, 1X, F8.2, 1X, F8.2,
     $        1X, F11.2, 1X, F7.1, 1X, F7.2, 1X, A6 )
 9992 FORMAT( 'Finished ', I6, ' tests, with the following results:' )
 9991 FORMAT( I5, ' tests completed and passed residual checks.' )
 9990 FORMAT( I5, ' tests completed without checking.' )
 9989 FORMAT( I5, ' tests completed and failed residual checks.' )
 9988 FORMAT( I5, ' tests skipped because of illegal input values.' )
 9987 FORMAT( 'END OF TESTS.' )
 9986 FORMAT( A )
*
      STOP
*
*     End of PDINVDRIVER
*
      END

<Prev in Thread] Current Thread [Next in Thread>


For additional information you may use the LAPACK/ScaLAPACK Forum.
Or one of the mailing lists, or