I think there maybe some thing wrong in pspotrf() function!

Open discussion regarding features, bugs, issues, vendors, etc.

I think there maybe some thing wrong in pspotrf() function!

Postby hevensun » Tue Jun 20, 2006 9:47 pm

Hi,
I have benn testing the scalapack 1.7.4, and met with some problem as followings. I used

machine is HP-SC 45
compad f77 -O4
mpich
blacs tester all passed


I tested all the exe, most of them have been passed,but there are some exe doesn't work with some specific input data , they are xsllt ,xdllt,xsinv,xdinv. I traced xsllt and xsinv , they are all hanged up at pspotrf() function and maybe xdllt and xdinv have the same problem. It seems that at grid 1*2, when the matrix order is large ,for example is 1024, it doesn't seem to work. Is there some bug in pspotrf() funcion?

the INV.dat is :

'ScaLAPACK, Version 1.0, Matrix Inversion Testing input file'
'MPI machine.'
'INV.out' output file name (if any)
6 device out
1 number of matrix types (next line)
'UPD' 'LPD ' GEN, UTR, LTR, UPD, LPD
1 number of problems sizes
1024 values of N
2 number of NB's
1 2 3 4 5 values of NB
1 number of process grids (ordered P & Q)
1 1 values of P
2 3 values of Q
100.0 threshold

and the LLT.dat is :

'ScaLAPACK, LLt factorization input file'
'MPI machine'
'LLT.out' output file name (if any)
6 device out
'U' define Lower or Upper
1 number of problems sizes
1024 1025 values of N
5 number of NB's
1 2 3 4 5 values of NB
2 number of NRHS's
1 3 9 28 values of NRHS
2 number of NBRHS's
1 3 5 7 values of NBRHS
1 number of process grids (ordered pairs P & Q)
1 1 values of P
2 3 values of Q
3.0 threshold
T (T or F) Test Cond. Est. and Iter. Ref. Routines
hevensun
 
Posts: 8
Joined: Wed Aug 31, 2005 9:20 pm

Postby hevensun » Thu Jun 22, 2006 1:37 am

About the xsllt ,I traced down to find the place where the program hanged up at the pspotrf().Then I added some write(*,*) statement to find the place pspotrf() hanged up . The some part of the code of pspotrf.f is like this:

* Perform unblocked Cholesky factorization on JB block
*
write(*,*) myrow,mycol,' before the ',J,'th pspotrf2 '
CALL PSPOTF2( UPLO, JB, A, I, J, DESCA, INFO )
write(*,*) myrow,mycol,' after the ',J,'th pspotrf2'
IF( INFO.NE.0 ) THEN
INFO = INFO + J - JA
GO TO 30
END IF
*
IF( J-JA+JB+1.LE.N ) THEN
*
* Form the row panel of U using the triangular solver
*
write(*,*) myrow,mycol,' before the ',J,'th pstrsm()'
CALL PSTRSM( 'Left', UPLO, 'Transpose', 'Non-Unit',
$ JB, N-J-JB+JA, ONE, A, I, J, DESCA, A,
$ I, J+JB, DESCA )
*
* Update the trailing matrix, A = A - U'*U
*
write(*,*) myrow,mycol,' between the two pssyrk pstrsm',J
CALL PSSYRK( UPLO, 'Transpose', N-J-JB+JA, JB,
$ -ONE, A, I, J+JB, DESCA, ONE, A, I+JB,
$ J+JB, DESCA )
write(*,*) myrow,mycol,' after the ',J,'the pssyrk() all'
END IF
10 CONTINUE
*
ELSE
*


then we can see that the program hanged up at pstrsm() function.The output is like this
................................
0 1 before the 82th pspotrf2
0 1 after the 82th pspotrf2
0 1 before the 82th pstrsm()
0 1 between the two pssyrk pstrsm 82
0 0 between the two pssyrk pstrsm 80
0 0 after the 80the pssyrk() all
0 0 before the 81th pspotrf2
0 0 after the 81th pspotrf2
0 0 before the 81th pstrsm()
0 0 between the two pssyrk pstrsm 81
0 0 after the 81the pssyrk() all
0 0 before the 82th pspotrf2
0 0 after the 82th pspotrf2
0 0 before the 82th pstrsm()
0 0 between the two pssyrk pstrsm 82
0 0 after the 82the pssyrk() all
0 0 before the 83th pspotrf2
0 0 after the 83th pspotrf2
0 0 before the 83th pstrsm()
0 1 after the 82the pssyrk() all
0 1 before the 83th pspotrf2
0 1 after the 83th pspotrf2
0 1 before the 83th pstrsm()


It's clear that the two process all hanged up at pstcsm() at the 83th call.What should I do about this problem?
Thanks .

from hevensun
hevensun
 
Posts: 8
Joined: Wed Aug 31, 2005 9:20 pm

Postby Julie » Mon Jun 26, 2006 12:09 pm

Hevensum,

I ran the testing withyour input data on a ALPHA cluster with the Compaq f77 Compiler. No problem for me, everything went through.

I just changed the TOTMEM variable to increase the memory.
Be careful not to increase it too much, You still need memory for the OS/BLACS, etc...

Code: Select all
*  TOTMEM   INTEGER, default = 2000000
*           TOTMEM is a machine-specific parameter indicating the
*           maximum amount of available memory in bytes.
*           The user should customize TOTMEM to his platform.  Remember
*           to leave room in memory for the operating system, the BLACS
*           buffer, etc.  For example, on a system with 8 MB of memory
*           per process (e.g., one processor on an Intel iPSC/860), the
*           parameters we use are TOTMEM=6200000 (leaving 1.8 MB for OS,
*           code, BLACS buffer, etc).  However, for PVM, we usually set
*           TOTMEM = 2000000.  Some experimenting with the maximum value
*           of TOTMEM may be required.
Julie
 
Posts: 299
Joined: Wed Feb 23, 2005 12:32 am
Location: ICL, Denver. Colorado


Return to User Discussion

Who is online

Users browsing this forum: No registered users and 5 guests