Scalapack example sample_pssyev_call.f fails for big matrix

Open discussion regarding features, bugs, issues, vendors, etc.

Scalapack example sample_pssyev_call.f fails for big matrix

Postby 2linear » Sat Apr 02, 2011 11:07 am

Hi,
I am using Scalapack example sample_pssyev_call.f to solve the eigenproblem for big matrices. So far I have used the code successfully to solve matrix of size 23400x23400. The code fails when I try to solve a larger matrix of size 39600x39600, however. According to the message below the problem seems to be with LWORK. I am defining LWORK as LWORK=MAXN*MAXN*8 and it has been working perfectly fine for smaller matrices. I tried several things such as LWORK query, increasing the number of processors, changed the value of NB, etc, but none of those solved the problem. I thought that maybe something was going wrong during definition of the input matrix A (for example elements in the term diag([...can take on very small values that would make PSSYEV to crash later on maybe because of some division by zero or a maximum interation reached, etc). Then I gave a try using my own data but the error is exactly the same, the problem with LWORK persists. As you can see in the message below, the code executes completely, however, the computation is incorrect. I am only printing the first 10 largest eigenvalues. I am not exhausting the memory, I have plenty (24 GB) for this type of calculation. Does anyone one has an idea on why the code is crashing..??.

Code: Select all
helios 619% mpirun -np 4 sample_pssyev_call
 ....READING..OK..
{    0,    1}:  On entry to PSSYEV parameter number   14 had an illegal value
{    1,    0}:  On entry to PSSYEV parameter number   14 had an illegal value
{    1,    1}:  On entry to PSSYEV parameter number   14 had an illegal value
{    0,    0}:  On entry to PSSYEV parameter number   14 had an illegal value
  N =       39600
  A =  hilb(N) + diag([1:-1/N:1/N])
  W(       39600 )=   0.0000000     ;
  W(       39599 )=   0.0000000     ;
  W(       39598 )=   0.0000000     ;
  W(       39597 )=   0.0000000     ;
  W(       39596 )=   0.0000000     ;
  W(       39595 )=   0.0000000     ;
  W(       39594 )=   0.0000000     ;
  W(       39593 )=   0.0000000     ;
  W(       39592 )=   0.0000000     ;
  W(       39591 )=   0.0000000     ;
  W(       39590 )=   0.0000000     ;
  backerror = A - Z * diag(W) * Z'
  resid = A * Z - Z * diag(W)
  ortho = Z' * Z - eye(N)
  norm(backerror)
  norm(resid)
  norm(ortho)
2linear
 
Posts: 17
Joined: Sun Jul 29, 2007 3:42 am
Location: montreal

Re: Scalapack example sample_pssyev_call.f fails for big mat

Postby admin » Tue Apr 05, 2011 8:39 am

You should try to call the PSSYEV routine with LWORK=-1, that will give you the MINIMUM size for the workspace needed.
The required workspace is returned as the first element of WORK.
Julie
admin
Site Admin
 
Posts: 468
Joined: Wed Dec 08, 2004 7:07 pm

Re: Scalapack example sample_pssyev_call.f fails for big mat

Postby 2linear » Fri Apr 08, 2011 10:16 am

Hi Julie,
THANKS a lot for your response. Sorry for the delay but I was traveling. As I said in my previous post I did try the workspace query but it didn't solve the problem, I got the same problem. You can find the code below. I would like to know if you can reproduce the same problem if you use the code below. The code posted uses a matrix of 23400x23400 and for that case the workspace query gave me LWORK=137053791. When I use small matrices there is not problems. For example the following are the results for N=50, 5000 and 10000:


FOR N=50
Code: Select all
 ...WORKSPACE...         999
  N =          50
  A =  hilb(N) + diag([1:-1/N:1/N])
  W(          50 )=   2.9754734     ;
  W(          49 )=   1.4921983     ;
  W(          48 )=   1.0202849     ;
  W(          47 )=  0.96556306     ;
  W(          46 )=  0.94131076     ;
  W(          45 )=  0.91875786     ;
  W(          44 )=  0.89694667     ;
  W(          43 )=  0.87555450     ;
  W(          42 )=  0.85443228     ;
  W(          41 )=  0.83349717     ;
  W(          40 )=  0.81269896     ;
  backerror = A - Z * diag(W) * Z'
  resid = A * Z - Z * diag(W)
  ortho = Z' * Z - eye(N)
  norm(backerror)
  norm(resid)
  norm(ortho)


FOR N=5000
Code: Select all
 ...WORKSPACE...     6284999
  N =        5000
  A =  hilb(N) + diag([1:-1/N:1/N])
  W(        5000 )=   3.5268095     ;
  W(        4999 )=   2.3897531     ;
  W(        4998 )=   1.5850198     ;
  W(        4997 )=   1.2004381     ;
  W(        4996 )=   1.0537189     ;
  W(        4995 )=   1.0091976     ;
  W(        4994 )=   1.0001695     ;
  W(        4993 )=  0.99946982     ;
  W(        4992 )=  0.99920326     ;
  W(        4991 )=  0.99895924     ;
  W(        4990 )=  0.99872619     ;
  backerror = A - Z * diag(W) * Z'
  resid = A * Z - Z * diag(W)
  ortho = Z' * Z - eye(N)
  norm(backerror)
  norm(resid)
  norm(ortho)

FOR N=10000

Code: Select all
 ...WORKSPACE...    25070000
  N =       10000
  A =  hilb(N) + diag([1:-1/N:1/N])
  W(       10000 )=   3.5754237     ;
  W(        9999 )=   2.4921775     ;
  W(        9998 )=   1.6786052     ;
  W(        9997 )=   1.2593535     ;
  W(        9996 )=   1.0828202     ;
  W(        9995 )=   1.0202723     ;
  W(        9994 )=   1.0028454     ;
  W(        9993 )=  0.99988818     ;
  W(        9992 )=  0.99968803     ;
  W(        9991 )=  0.99955648     ;
  W(        9990 )=  0.99943531     ;
  backerror = A - Z * diag(W) * Z'
  resid = A * Z - Z * diag(W)
  ortho = Z' * Z - eye(N)
  norm(backerror)
  norm(resid)
  norm(ortho)


Now, for very large matrices (that still fit my memory of course) the code crashes after a few hours running and got the following error: mpirun noticed that process rank 0 with PID 8457 on node arxt27 exited on signal 9 (Killed). In all the experiments I always kept the parameter MAXMAXN with a value larger than the workspace. Do you have any idea where is the problem ?. THANKS a lot for your help.
JOVI



*
Code: Select all
*
      PROGRAM SAMPLE_PSSYEV_CALL_TEST

      INTEGER :: LWORK, MAXN, LDA, N, MAXMAXN, MAXPROCS
      REAL, PARAMETER :: MONE = -1.0E+0, ZERO = 0.0E+0
*     ..
*     .. Local Scalars ..
      INTEGER :: CONTEXT, I, IAM, INFO, MYCOL, MYROW, NB
      INTEGER :: NPCOL, NPROCS, NPROW
*     ..
*     .. Local Arrays ..
       INTEGER :: DESCA( 50 ), DESCZ( 50 )
       REAL, DIMENSION(:), ALLOCATABLE :: WORK
       REAL, DIMENSION(:), ALLOCATABLE :: W
       REAL, DIMENSION(:,:), ALLOCATABLE :: A, Z
*     .. External Subroutines ..
      EXTERNAL ::  BLACS_EXIT, BLACS_GET, BLACS_GRIDEXIT
      EXTERNAL :: BLACS_GRIDINFO, BLACS_GRIDINIT, BLACS_PINFO
      EXTERNAL :: BLACS_SETUP, DESCINIT, PSLAMODHILB, PSLAPRNT
      EXTERNAL :: PSSYEV
*     ..
*     .. Executable Statements ..
*
*
*     Set up the problem
*
      N = 23400
      MAXN=N
      LDA=MAXN
      MAXMAXN=MAXN*MAXN*2
      NB = 1
      NPROW = 2
      NPCOL = 2
      MAXPROCS=512
         ALLOCATE(WORK(MAXMAXN))
*         ALLOCATE(WORK(LWORK))
         ALLOCATE(W(MAXN))
         ALLOCATE(A(LDA,LDA),Z(LDA,LDA))

*
*
*     Initialize the BLACS
*
      CALL BLACS_PINFO( IAM, NPROCS )
      IF( ( NPROCS.LT.1 ) ) THEN
         CALL BLACS_SETUP( IAM, NPROW*NPCOL )
      END IF
*
*
*     Initialize a single BLACS context
*
      CALL BLACS_GET( -1, 0, CONTEXT )
      CALL BLACS_GRIDINIT( CONTEXT, 'R', NPROW, NPCOL )
      CALL BLACS_GRIDINFO( CONTEXT, NPROW, NPCOL, MYROW, MYCOL )
*
*     Bail out if this process is not a part of this context.
*
      IF( MYROW.EQ.-1 )
     $   GO TO 20
*
*
*     These are basic array descriptors
*
      CALL DESCINIT( DESCA, N, N, NB, NB, 0, 0, CONTEXT, LDA, INFO )
      CALL DESCINIT( DESCZ, N, N, NB, NB, 0, 0, CONTEXT, LDA, INFO )
*
*     Build a matrix that you can create with
*     a one line matlab command:  hilb(n) + diag([1:-1/n:1/n])
*
      CALL PSLAMODHILB( N, A, 1, 1, DESCA, INFO )
*
*     Ask PSSYEV to compute the entire eigendecomposition
*
      LWORK=-1
      CALL PSSYEV( 'V', 'U', N, A, 1, 1, DESCA, W, Z, 1, 1,
     $             DESCZ, WORK, LWORK, INFO )

      LWORK=WORK(1)+1
           IF((MYROW.EQ.0).AND.(MYCOL.EQ.0)) THEN
            PRINT*,'...WORKSPACE...',LWORK
           ENDIF
*
      CALL PSSYEV( 'V', 'U', N, A, 1, 1, DESCA, W, Z, 1, 1,
     $             DESCZ, WORK, LWORK, INFO )

*
*     Print out the eigenvectors
*
*      CALL PSLAPRNT( N, N, Z, 1, 1, DESCZ, 0, 0, 'Z', 6, WORK )
*
*     Print out matlab code which will check the residual
*
      IF( MYROW.EQ.0 .AND. MYCOL.EQ.0 ) THEN
         PRINT *, ' N =', N
         PRINT *, ' A =  hilb(N) + diag([1:-1/N:1/N])'
*         DO 10 I = 1, N
         DO 10 I = N, N-10, -1
            PRINT *, ' W(', I, ')=', W( I ), ';'
*           WRITE(111,*) W(I)
   10    CONTINUE
         PRINT *, ' backerror = A - Z * diag(W) * Z'' '
         PRINT *, ' resid = A * Z - Z * diag(W)'
         PRINT *, ' ortho = Z'' * Z - eye(N)'
         PRINT *, ' norm(backerror)'
         PRINT *, ' norm(resid)'
         PRINT *, ' norm(ortho)'
      END IF
*
      CALL BLACS_GRIDEXIT( CONTEXT )
*
   20 CONTINUE
*
      CALL BLACS_EXIT( 0 )
*
*
*     Uncomment this line on SUN systems to avoid the useless print out
*
*      CALL IEEE_FLAGS( 'clear', 'exception', 'underflow', '')
*
*
 9999 FORMAT( 'W=diag([', 4E16.12, ']);' )
*
      STOP
      END
*
      SUBROUTINE PSLAMODHILB( N, A, IA, JA, DESCA, INFO )
*
*  -- ScaLAPACK routine (version 1.2) --
*     University of Tennessee, Knoxville, Oak Ridge National Laboratory,
*     and University of California, Berkeley.
*     May 10, 1996
*
*
*
*
*     .. Parameters ..
      INTEGER            BLOCK_CYCLIC_2D, DLEN_, DT_, CTXT_, M_, N_,
     $                   MB_, NB_, RSRC_, CSRC_, LLD_
      PARAMETER          ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DT_ = 1,
     $                   CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6,
     $                   RSRC_ = 7, CSRC_ = 8, LLD_ = 9 )
      REAL               ONE
      PARAMETER          ( ONE = 1.0E+0 )
*     ..
*     .. Scalar Arguments ..
      INTEGER            IA, INFO, JA, N
*     ..
*     .. Array Arguments ..
      INTEGER            DESCA( * )
      REAL               A( * )
*     ..
*     .. Local Scalars ..
      INTEGER            I, J, MYCOL, MYROW, NPCOL, NPROW
*     ..
*     .. External Subroutines ..
      EXTERNAL           BLACS_GRIDINFO, PSELSET
*     ..
*     .. Intrinsic Functions ..
      INTRINSIC          REAL
*     ..
*     .. Executable Statements ..
*
*
*     The matlab code for a real matrix is:
*         hilb(n) + diag( [ 1:-1/n:1/n ] )
*     The matlab code for a complex matrix is:
*         hilb(N) + toeplitz( [ 1 (1:(N-1))*i ] )
*
*       This is just to keep ftnchek happy
      IF( BLOCK_CYCLIC_2D*CSRC_*CTXT_*DLEN_*DT_*LLD_*MB_*M_*NB_*N_*
     $    RSRC_.LT.0 )RETURN
*
      INFO = 0
*
      CALL BLACS_GRIDINFO( DESCA( CTXT_ ), NPROW, NPCOL, MYROW, MYCOL )
*
*
      IF( IA.NE.1 ) THEN
         INFO = -3
      ELSE IF( JA.NE.1 ) THEN
         INFO = -4
      END IF
*
      DO 20 J = 1, N
         DO 10 I = 1, N
            IF( I.EQ.J ) THEN
               CALL PSELSET( A, I, J, DESCA,
     $                       ( REAL( N-I+1 ) ) / REAL( N )+ONE /
     $                       ( REAL( I+J )-ONE ) )
            ELSE
               CALL PSELSET( A, I, J, DESCA, ONE/ ( REAL( I+J )-ONE ) )
            END IF
   10    CONTINUE
   20 CONTINUE
*
*
      RETURN
*
*     End of PSLAMODHLIB
*
      END
2linear
 
Posts: 17
Joined: Sun Jul 29, 2007 3:42 am
Location: montreal

Re: Scalapack example sample_pssyev_call.f fails for big mat

Postby 2linear » Sun Apr 10, 2011 7:00 pm

Hi,
This problem is solved, the code works flawlessly. I tried in a different system. It seems I was having some problems with memory.
Thanks
JOVI
2linear
 
Posts: 17
Joined: Sun Jul 29, 2007 3:42 am
Location: montreal


Return to User Discussion

Who is online

Users browsing this forum: No registered users and 1 guest