Hello Ivan,
Aarghh, yes this is our fault, I have spotted this problem recently. Sorry
about that.
Here is a quick workaround and we'll patch ScaLAPACK pdsyevx soon.
The problem is that we put 3 values in WORK when LWORK=1 in PDSYEVX. So
what you are doing: calling PDSYEVX with LWORK=1 and a pointer to the
double CLWORK does not work.
The workaround is to declare CLWORK an array of three COMPLEX.
COMPLEX :: CLWORK(3)
After that, you should be good.
Good luck for the 72,000x72,000 matrix ...
What is the value of IUIL+1 ?
By the way, there is a new routine available as prerelease for testing
based on MRRR (pdsyevr), it is also able to find a subset of the
eigenvalues as pdsyevx and might be significantly faster than pdsyevx.
Although if IU and IL is small, it's not going to make any difference.
Send us an email if you want to give a try.
Julien
On Tue, 14 Mar 2006, Ivan Silvestre Paganini Marin wrote:
Sorry again, I just forgot the size of the matrix. It is variable, from
216x216 to over 72000x72000. This is determined by the variable DIMAX.
Ivan
On Tue, 14 Mar 2006, Ivan Silvestre Paganini Marin wrote:
Hello Julien! Thank you for the answer. I am running still on one
machine, for testing, with four processes, but I am planning to run on a
beowulf with 12 processors. Here is part of my code:
INTEGER, ALLOCATABLE, DIMENSION(:) :: DESCHT
REAL, ALLOCATABLE, DIMENSION(:) :: RWORK
COMPLEX, ALLOCATABLE, DIMENSION(:) :: WORK
INTEGER, ALLOCATABLE, DIMENSION(:) :: IWORK
INTEGER, ALLOCATABLE, DIMENSION(:) :: IFAIL
INTEGER, ALLOCATABLE, DIMENSION(:) :: ICLUSTR
REAL, ALLOCATABLE, DIMENSION(:) :: GAP
COMPLEX, ALLOCATABLE, DIMENSION(:,:) :: AUXVETOR
INTEGER :: LWORK,LRWORK,LIWORK
INTEGER :: LDROW,LDCOL,LINUM,LD_AUXVCT2
INTEGER :: NBNEIG
INTEGER :: IL,IU
INTEGER :: INFO
REAL :: ABSTOL,VL,VU
INTEGER :: NEIG
INTEGER :: NZ
COMPLEX :: CLWORK
REAL :: RLRWORK
REAL :: ORFAC = 1.0
INTEGER :: DLEN
EXTERNAL PCHEEVX
COMPLEX, ALLOCATABLE, DIMENSION(:,:) :: HT
CALL BLACS_PINFO( IAM, NPROCS )
CALL BLACS_GRIDINFO( CONTEXT, NPROW, NPCOL, MYROW, MYCOL )
DLEN = BLAS_DLEN(NPROW*NPCOL,DIMAX)
ALLOCATE(DESCHT(DLEN),DESCVT(DLEN)
CALL DESCINIT( DESCHT, DIMAX, DIMAX, MATRIX_TYPE, MATRIX_TYPE, 0, 0,
CONTEXT, MXLLDA, INFO )
LDCOL = NUMROC(DIMAX, MATRIX_TYPE , MYCOL , 0 , NPROW)
LDROW = NUMROC(DIMAX, MATRIX_TYPE , MYROW , 0 , NPCOL)
ALLOCATE(IFAIL(DIMAX),ICLUSTR(2*NPROW*NPCOL),GAP(NPROW*NPCOL))
ALLOCATE(AUXVETOR(LDROW,LDCOL))
ABSTOL = 2.0 * SLAMCH('S')
ALLOCATE(HT(LDROW,LDCOL))
HT = CMPLX(0.0,0.0)
ALLOCATE(VALOR(DIMAX))
CALL PCHEEVX('V','I','L',DIMAX,HT,1,1,DESCHT,VL,VU,IL,IU,ABSTOL,NEIG,
NZ,VALOR,ORFAC,AUXVETOR,1,1,DESCHT,CLWORK,1,RLRWORK,1,
LIWORK,1,IFAIL,ICLUSTR,GAP,INFO)
LWORK = INT(CLWORK)
LRWORK = INT(RLRWORK)
ALLOCATE(WORK(LWORK),RWORK(LRWORK),IWORK(LIWORK))
CALL PCHEEVX('V','I','L',DIMAX,HT,1,1,DESCHT,VL,VU,IL,IU,ABSTOL,NEIG,&
&NZ,VALOR,ORFAC,AUXVETOR,1,1,DESCHT,&
&WORK,LWORK,RWORK,LRWORK,IWORK,LIWORK,IFAIL,ICLUSTR,GAP,INFO)
This code is enclosed in a subroutine, and DIMAX is a parameter passed
to this subroutine, just like MATRIX_TYPE. This is mainly my
diagonalization routine, except some calculations for the submatrices
that are pertinent to my phisics problem. So, what can be the error? I
have several times checked the parameters and variables that are being
passed to this subroutine and just before the first PCHEEVX call. I have
some MPI calls in the main program (MPI_BCASTs and MPI_ALLGATHER). I
have a MPI_Init and MPI_Finalize in the main routine together with
BLACS_GRIDINIT and BLACS_EXIT. Maybe this is a problem?
Other question: I have tried to debug my code using totalview 7, but
when the error is inside the PCHEEVX, the debugger just gives me
assembler, no the source code (what I think would be very useful). I
have recompiled SCALAPACK without the optimization flags (O3) and with
g, but still I just got assembler when tries to debug. Is there any
other way, or better still, other debugger for this?
If it is necessary or if you have the patience, I can send to you the
entire subroutine.
Many thanks!
Ivan
PS: Should I send this email to the scalapack list again?
Em Ter, 20060314 Ã s 09:35 0500, Julien Langou escreveu:
What is the number of processors and the size of the matrix?
Can we get the few lines in your code when you call pcheevx?
Julien
On Mon, 13 Mar 2006, Ivan Silvestre Paganini Marin wrote:
Correction: I forgot to tell in my last email (below) that the error
appears on the PCHEEVX run to determine the parameters for the rest of
the program (set 1 to some parameters to calculate sizes of WORK,
LWORK, etc.)
Sorry, and thanks again!
Hello everybody! I am trying to paralelize my sequential application
that uses lapack. I am writing a code using ScaLAPACK (compiled with
pgf90 from netlib),using the routine PCHEEVX. The code is very large and
probably has tons of bugs, but this one is kind of elusive to me. When
the routine PCHEEVX is called for determinacy of the paremeters for the
computation, I got the following error (mpirun np 4):
++++++++++++++++++++++++++++++++
p0_14109: p4_error: interrupt SIGx: 4
Killed by signal 2.
Killed by signal 2.
Killed by signal 2.
++++++++++++++++++++++++++++++++
I got this error from acml 2.5.0 libscalapack, or the pgf90 compiled
from pgi. All the tests from lapack to scalapack passing by the blacs
routines have be run, and successfully, with the same mpich that I use
for the main program or mpich error? I really think that is a bug in my
implementation of PCHEEVX, but some help with this error will help a
lot...
Many thanks.
Ivan Marin
LaboratÃ³rio de FÃsica Computacional
Instituto de FÃsica de SÃ£o Carlos
Universidade de SÃ£o Paulo
_______________________________________________
Scalapack mailing list
Scalapack@Domain.Removed
http://lists.cs.utk.edu/listinfo/scalapack
