PDGBTRF/S Problems

Open discussion regarding features, bugs, issues, vendors, etc.

PDGBTRF/S Problems

Postby ashtonaut » Thu Feb 03, 2005 10:52 pm

I am converting Fortran code from serial to parallel. The important step is solving a banded matrix system Ax=b, where A is a real banded matrix. In the serial code (which runs correctly), I call DGBTRF followed by DGBTRS.

In the parallel version, I am using PDGBTRF followed by PDGBTRS. The factorisation appears to complete successfully, but when I call the solve routine, I get "p4_error: interrupt SIGFPE: 8", which I understand is a Floating Point Exception.

I know that the banded 'A' matrix and the right hand side vector 'b' are identical to the serial version, because in order to debug I am running the parallel version on just one processor at the moment. The only difference between the matrices in memory is that the parallel version has more fill-in space above the entries (as required).

I noticed in the documentation that PDGBTRF/S describe the A matrix as "N-by-N real banded", but the corresponding serial routines DGBTRF/S describe A as "M-by-N band matrix".

Why does the parallel version require the A matrix (which is banded) to be square? Is this the reason why my factorise and solve works fine with LAPACK, but not SCALAPACK?

Any help much appreciated,

Ashton Peters
ashtonaut
 
Posts: 13
Joined: Thu Jan 27, 2005 6:53 pm

Postby Stan Tomov » Mon Feb 07, 2005 4:18 am

PDGBTRF is one of the routines whose algorithm does not follow
and does not yet have the full functionality of its LAPACK "equivalent".
Currently, PDGBTRF is implemented only for certain N-by-N real banded
matrices, and as you observed in your previous "Question regarding
PDGBTRF" the current implementation (for example) requires that the
local submatrices are nonsingular. If you are trying to solve M-by-N
problem with M != N the routine will generate an error.

If the above mentioned exceptions are not the case the method should
work and for completeness, here is a very simple example of using
PDGBTRF and PDGBTRS
Code: Select all
      PROGRAM PDGBTRF_EXAMPLE

*
*     This is an example of using PDGBTRF and PDGBTRS.
*     A matrix of size 9x9 is distributed on a 1x3 process
*     grid, factored, and solved in parallel.
*     
*
      INTEGER          ICTXT, INFO, MYCOL, MYROW, NPCOL, NPROW
      INTEGER          BWL, BWU, N, NB, LAF, LWORK

      PARAMETER       (BWL = 1, BWU = 1, NB = 3, N = 9)
      PARAMETER       (LAF=(NB+BWU)*(BWL+BWU)+6*(BWL+BWU)*(BWL+2*BWU))
      PARAMETER       (LWORK = LAF)

      INTEGER          DESCA(7), DESCB(7), IPIV(NB), FILL_IN
      DOUBLE PRECISION A(1+2*BWL+2*BWU,NB),B(NB),AF(LAF),WORK(LWORK)

*     
*     INITIALIZE THE PROCESS GRID
*
      NPROW = 1
      NPCOL = 3
      CALL SL_INIT( ICTXT, NPROW, NPCOL )
      CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL )

*
*     DISTRIBUTE THE MATRIX ON THE PROCESS GRID
*     Initialize the array descriptors for the matrices A and B
*
      DESCA( 1 ) = 501                   ! descriptor type
      DESCA( 2 ) = ICTXT                 ! BLACS process grid handle
      DESCA( 3 ) = N                     ! number of rows in A
      DESCA( 4 ) = NB                    ! Blocking factor of the distribution
      DESCA( 5 ) = 0                     ! size of block rows
      DESCA( 6 ) = 1+2*BWL+2*BWU         ! leading dimension of A
      DESCA( 7 ) = 0                     ! process row for 1st row of A

      DESCB( 1 ) = 502                   ! descriptor type
      DESCB( 2 ) = ICTXT                 ! BLACS process grid handle
      DESCB( 3 ) = N                     ! number of rows in B
      DESCB( 4 ) = NB                    ! Blocking factor of the distribution
      DESCB( 5 ) = 0                     ! size of block rows
      DESCB( 6 ) = NB                    ! leading dimension of B
      DESCB( 7 ) = 0                     ! process row for 1st row of B

*
*     Generate matrices A and B and distribute them to the process grid
*
*           P1          P2           P3         X        B
*
*      / 1   1     .           .           \  / 1 \    / 2 \
*      | 1   1   1 .           .            | | 1 |    | 3 |
*      |     1   1 . 2         .            | | 1 |    | 4 |
*      |         1 . 2   2     .            | | 1 |    | 5 |
*      |           . 2   2   2 .            | | 1 | =  | 6 |
*      |           .     2   2 . 3          | | 1 |    | 7 |
*      |           .         2 . 3   3      | | 1 |    | 8 |
*      |           .           . 3   3   3  | | 1 |    | 9 |
*      \           .           .     3   3 /  \ 1 /    \ 6 /
*
      FILL_IN = BWL+BWU
      IF (MYCOL == 0) THEN
         A(2+FILL_IN,1) = 1.d0
         A(3+FILL_IN,1) = 1.d0
         A(1+FILL_IN,2) = 1.d0
         A(2+FILL_IN,2) = 1.d0
         A(3+FILL_IN,2) = 1.d0
         A(1+FILL_IN,3) = 1.d0
         A(2+FILL_IN,3) = 1.d0
         A(3+FILL_IN,3) = 1.d0
         B(1) = 2
         B(2) = 3
         B(3) = 4
      ELSE IF (MYCOL == 1) THEN
         A(1+FILL_IN,1) = 2.d0
         A(2+FILL_IN,1) = 2.d0
         A(3+FILL_IN,1) = 2.d0
         A(1+FILL_IN,2) = 2.d0
         A(2+FILL_IN,2) = 2.d0
         A(3+FILL_IN,2) = 2.d0
         A(1+FILL_IN,3) = 2.d0
         A(2+FILL_IN,3) = 2.d0
         A(3+FILL_IN,3) = 2.d0
         B(1) = 5
         B(2) = 6
         B(3) = 7
      ELSE IF (MYCOL == 2) THEN 
         A(1+FILL_IN,1) = 3.d0
         A(2+FILL_IN,1) = 3.d0
         A(3+FILL_IN,1) = 3.d0
         A(1+FILL_IN,2) = 3.d0
         A(2+FILL_IN,2) = 3.d0
         A(3+FILL_IN,2) = 3.d0
         A(1+FILL_IN,3) = 3.d0
         A(2+FILL_IN,3) = 3.d0
         B(1) = 8
         B(2) = 9
         B(3) = 6
      END IF

*
*     Perform LU factorization
*
      CALL PDGBTRF( N, BWL, BWU, A(1,1), 1, DESCA, IPIV,
     $     AF, LAF, WORK, LWORK, INFO )

      IF (INFO/=0) THEN
         write(*,*) 'Info flag from PDGBTRF = ',INFO, ', Col = ',MYCOL
         GOTO 100
      END IF

*
*     Solve using the LU factorization from PDGBTRF
*
      CALL PDGBTRS('N', N, BWL, BWU, 1, A(1,1), 1, DESCA, IPIV,
     $     B(1), 1, DESCB, AF, LAF, WORK, LWORK, INFO)

      IF (INFO/=0) THEN
         write(*,*) 'Info flag from PDGBTRS = ',INFO, ', Col = ',MYCOL
         GOTO 100
      END IF

      write(*,200) 'X(',3*MYCOL+1,':',3*MYCOL+NB,') = ',B
 200  format((a2,I3,a1,I3,a4,3(f10.2,2x)))
*
*     Release the process grid and exit BLACS
*
 100  CALL BLACS_GRIDEXIT( ICTXT )
      CALL BLACS_EXIT( 0 )

      STOP
      END
Stan Tomov
 
Posts: 13
Joined: Thu Dec 09, 2004 1:28 pm

Postby ashtonaut » Mon Feb 07, 2005 4:48 pm

Firstly, thank you Stan for the easy to follow reply, I am relatively new to LAPACK, so your sample code is of great assistance.

I think I am a little confused as to the definition of the banded A matrix being 'NxN' in PDGBTRF/S. Does this mean the original A matrix (before it gets stored in banded form) is square, or does it mean when stored in banded form it must be square? I think you mean the former, but I was a little confused by the documentation.

If you do indeed mean the former, then I think my problem should be able to be solved. I am solving a finite element system A*x=b, where A is originally a square (NxN) matrix but is stored in banded form (bwl=bwu~=80, N~=3000), and b is a vector of length N. Therefore I think (please confirm) that I am indeed solving with a 'NxN' A matrix.

I will check my code against your sample code to try and track down where I am going wrong.

Thanks again for yout excellent reply.

Ashton
ashtonaut
 
Posts: 13
Joined: Thu Jan 27, 2005 6:53 pm

Postby ashtonaut » Mon Feb 07, 2005 5:33 pm

Stan, I found an interesting similarity - when I compile and run your sample code I get the same errors I get with my finite element code.

The compile statement is

$ pgf90 -Mscalapack -C -o testpdgbtrfs.opt testpdgbtrfs.f

and the run command is

$ mpirun -np 3 testpdgbtrfs.opt

I get the following

p0_31312: p4_error: interrupt SIGFPE: 8
Killed by signal 2.
Killed by signal 2.
/usr/pgi/linux86-64/5.2/bin/mpirun: line 1: 31312 Broken pipe /home/ape20/fwdsolvers/testing/testpdgbtrfs.opt -p4pg /home/ape20/fwdsolvers/testing/PI31194 -p4wd /home/ape20/fwdsolvers/testing


I have narrowed the problem down to the PDGBTRS call i.e. it is this call that causes these error messages in both your sample code and my own code.

Please advise if you think that this is perhaps a SCALAPACK bug, or alternatively if there is something in my compile and/or run commands that is incorrect.

FYI, our system is as follows:

Ten node Beowulf cluster
Dual AMD Opteron 244 Processors, 2GB ram per node
Rocks Linux 3.3
Portland Group PGI CDK 5.2

Thanks again for your assistance,

Ashton
ashtonaut
 
Posts: 13
Joined: Thu Jan 27, 2005 6:53 pm

Postby Stan Tomov » Tue Feb 08, 2005 3:12 am

Hi Ashton,
I am glad the reply is helpful.
Yes, the original matrix A has to be square (NxN). The
local are in packed format of size DESCA( 6 ) x DESCA( 4 ).
You can not have varying DESCA(4) for the different processors.
If for example N = 10 and you distribute among 3 processors (see the
example below), NB = DESCA(4) should be 4. The "first" 2 processors
will have 4 columns of A each, and the last one will have 2.

I compile and run the example without the problems that you have.
I link with scalapack, blas, atlas, mpi, and several blacs libraries.
The easiest way to figure out how to compile is to go to
your main scalapack directory and do
make exe
This will create your testing routines. They should work and
if they do, link your program as done in creating the testing
routines. I hope this will work.
Below is one more example.
Stan

P.S. I compile and run with something like
ifort -o testing PDGBTRF_test2.f libscalapack.a blacsF77init_MPI-LINUX-0.a blacs_MPI-LINUX-0.a blacsF77init_MPI-LINUX-0.a libf77blas.a libmpich.a
mpirun -np 3 testing


Code: Select all
      PROGRAM PDGBTRF_EXAMPLE

*
*     This is an example of using PDGBTRF and PDGBTRS.
*     A matrix of size 10x10 is distributed on a 1x3 process
*     grid, factored, and solved in parallel.
*     
*
      INTEGER          ICTXT, INFO, MYCOL, MYROW, NPCOL, NPROW
      INTEGER          BWL, BWU, N, NB, LAF, LWORK

      PARAMETER       (BWL = 1, BWU = 1, NB = 4, N = 10)
      PARAMETER       (LAF=(NB+BWU)*(BWL+BWU)+6*(BWL+BWU)*(BWL+2*BWU))
      PARAMETER       (LWORK = LAF)

      INTEGER          DESCA(7), DESCB(7), IPIV(NB), FILL_IN
      DOUBLE PRECISION A(1+2*BWL+2*BWU,NB),B(NB),AF(LAF),WORK(LWORK)

*     
*     INITIALIZE THE PROCESS GRID
*
      NPROW = 1
      NPCOL = 3
      CALL SL_INIT( ICTXT, NPROW, NPCOL )
      CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL )

*
*     DISTRIBUTE THE MATRIX ON THE PROCESS GRID
*     Initialize the array descriptors for the matrices A and B
*
      DESCA( 1 ) = 501                   ! descriptor type
      DESCA( 2 ) = ICTXT                 ! BLACS process grid handle
      DESCA( 3 ) = N                     ! number of rows in A
      DESCA( 4 ) = NB                    ! Blocking factor of the distribution
      DESCA( 5 ) = 0                     ! size of block rows
      DESCA( 6 ) = 1+2*BWL+2*BWU         ! leading dimension of A
      DESCA( 7 ) = 0                     ! process row for 1st row of A


      DESCB( 1 ) = 502                   ! descriptor type
      DESCB( 2 ) = ICTXT                 ! BLACS process grid handle
      DESCB( 3 ) = N                     ! number of rows in B
      DESCB( 4 ) = NB                    ! Blocking factor of the distribution
      DESCB( 5 ) = 0                     ! size of block rows
      DESCB( 6 ) = NB                    ! leading dimension of B
      DESCB( 7 ) = 0                     ! process row for 1st row of B

*
*     Generate matrices A and B and distribute them to the process grid
*
*              P1           P2           P3         X        B
*
*      / 1   1         .               .      \  / 1 \    / 2 \
*      | 1   1   1     .               .       | | 1 |    | 3 |
*      |     1   1   2 .               .       | | 1 |    | 4 |
*      |         1   2 . 2             .       | | 1 |    | 5 |
*      |             2 . 2   2         .       | | 1 | =  | 6 |
*      |               . 2   2   2     .       | | 1 |    | 6 |
*      |               .     2   2   3 .       | | 1 |    | 7 |
*      |               .         2   3 . 3     | | 1 |    | 8 |
*      |               .             3 . 3   3 | | 1 |    | 9 |
*      \               .               . 3   3 / \ 1 /    \ 6 /
*                                         
*
      FILL_IN = BWL+BWU
      IF (MYCOL == 0) THEN
         A(2+FILL_IN,1) = 1.d0
         A(3+FILL_IN,1) = 1.d0
         A(1+FILL_IN,2) = 1.d0
         A(2+FILL_IN,2) = 1.d0
         A(3+FILL_IN,2) = 1.d0
         A(1+FILL_IN,3) = 1.d0
         A(2+FILL_IN,3) = 1.d0
         A(3+FILL_IN,3) = 1.d0
         A(1+FILL_IN,4) = 2.d0
         A(2+FILL_IN,4) = 2.d0
         A(3+FILL_IN,4) = 2.d0
         B(1) = 2.d0
         B(2) = 3.d0
         B(3) = 4.d0
         B(4) = 5.d0
      ELSE IF (MYCOL == 1) THEN
         A(1+FILL_IN,1) = 2.d0
         A(2+FILL_IN,1) = 2.d0
         A(3+FILL_IN,1) = 2.d0
         A(1+FILL_IN,2) = 2.d0
         A(2+FILL_IN,2) = 2.d0
         A(3+FILL_IN,2) = 2.d0
         A(1+FILL_IN,3) = 2.d0
         A(2+FILL_IN,3) = 2.d0
         A(3+FILL_IN,3) = 2.d0
         A(1+FILL_IN,4) = 3.d0
         A(2+FILL_IN,4) = 3.d0
         A(3+FILL_IN,4) = 3.d0
         B(1) = 6.d0
         B(2) = 6.d0
         B(3) = 7.d0
         B(4) = 8.d0
      ELSE IF (MYCOL == 2) THEN 
         A(1+FILL_IN,1) = 3.d0
         A(2+FILL_IN,1) = 3.d0
         A(3+FILL_IN,1) = 3.d0
         A(1+FILL_IN,2) = 3.d0
         A(2+FILL_IN,2) = 3.d0
         B(1) = 9.d0
         B(2) = 6.d0
      END IF

*
*     Perform LU factorization
*
      CALL PDGBTRF( N, BWL, BWU, A(1,1), 1, DESCA, IPIV,
     $     AF, LAF, WORK, LWORK, INFO )

      IF (INFO/=0) THEN
         write(*,*) 'Info flag from PDGBTRF = ',INFO, ', Col = ',MYCOL
         GOTO 100
      END IF

*
*     Solve using the LU factorization from PDGBTRF
*
      CALL PDGBTRS('N', N, BWL, BWU, 1, A(1,1), 1, DESCA, IPIV,
     $     B(1), 1, DESCB, AF, LAF, WORK, LWORK, INFO)

      IF (INFO/=0) THEN
         write(*,*) 'Info flag from PDGBTRS = ',INFO, ', Col = ',MYCOL
         GOTO 100
      END IF

      write(*,200) 'X(processor ',MYCOL,') = ',B(1:DESCB(4))
 200  format((a12,I3,a4,4(f10.2,2x)))
*
*     Release the process grid and exit BLACS
*
 100  CALL BLACS_GRIDEXIT( ICTXT )
      CALL BLACS_EXIT( 0 )

      STOP
      END
Last edited by Stan Tomov on Thu Feb 10, 2005 3:54 pm, edited 2 times in total.
Stan Tomov
 
Posts: 13
Joined: Thu Dec 09, 2004 1:28 pm

Postby ashtonaut » Wed Feb 09, 2005 4:59 pm

I have sent a support request to the Portland Group about this problem, maybe I am doing something wrong when compiling, or maybe there is a problem with the PGI compilers.

If anyone reading this thread has Portland Group compilers, would they mind compiling and running Stan's example code to test if it works? If possible, use the same compile statement as I did.

The PGI User's Guide says that when I use the option -Mscalapack, the following libraries are linked:

scalapack.a
blacsCinit_MPI-LINUX-0.a
blacs_MPI-LINUX-0.a
blacsF77init_MPI-LINUX-0.a
libblas.a
libmpich.a

Stan, do you see that there is anything missing here that might cause the problem I am having?

Thanks again,

Ashton
ashtonaut
 
Posts: 13
Joined: Thu Jan 27, 2005 6:53 pm

update

Postby ashtonaut » Wed Feb 16, 2005 7:20 pm

A brief update...

I (finally) got a reply from PGI:

Here is what we currently have determined.

The actual failure is caused by a divide by zero in the MOD function
at line 542 of the scalapack file "pdgbtrs.f". However, the variable
NB should not be zero here. I appears that one of the three processes
is mangling the values of the DESCA array in the PDGBTRF, even though
DESCA is not referenced in the part of the code where mangling occurs.

This smells like an array stepping on another piece of data, or
a compiler bug, but that seems unlikely when compiling -O0 or -g
still causes it. The error still occurs.

We are continuing to investigate.


They seem to think it is either a bug in the compiler or bad code.

I will update this thread when I get a resolution.

Ashton
ashtonaut
 
Posts: 13
Joined: Thu Jan 27, 2005 6:53 pm

Postby ashtonaut » Wed Apr 27, 2005 7:58 pm

Just a quick update: We have received no further response from PGI, and are no longer using their compilers.
ashtonaut
 
Posts: 13
Joined: Thu Jan 27, 2005 6:53 pm

Postby ashtonaut » Thu Apr 28, 2005 1:39 am

Stan,

I am now testing the first sample code you provided (9x9 matrix) with the g77 compilers, using the ACML 2.5 math libraries (including ACML Scalapack). I am hoping that it will work where the PGI compilers did not.

However, when I compile and run your code (copied verbatim) I get the following error:

% mpirun -np 3 testpdgbtrsG77.opt

{ 0, 0}: On entry to PDGBTRS parameter number 1101 had an illegal value
Info flag from PDGBTRS = -1101, Col = 0
{ 0, 1}: On entry to PDGBTRS parameter number 1101 had an illegal value
Info flag from PDGBTRS = -1101, Col = 1
{ 0, 2}: On entry to PDGBTRS parameter number 1101 had an illegal value
Info flag from PDGBTRS = -1101, Col = 2

I would assume that there is nothing wrong with your code, so can you suggest what might be causing this error code?

Thanks,

Ashton
ashtonaut
 
Posts: 13
Joined: Thu Jan 27, 2005 6:53 pm


Return to User Discussion

Who is online

Users browsing this forum: Bing [Bot] and 4 guests

cron