PDGBTRF/S Problems

Open discussion regarding features, bugs, issues, vendors, etc.

PDGBTRF/S Problems

I am converting Fortran code from serial to parallel. The important step is solving a banded matrix system Ax=b, where A is a real banded matrix. In the serial code (which runs correctly), I call DGBTRF followed by DGBTRS.

In the parallel version, I am using PDGBTRF followed by PDGBTRS. The factorisation appears to complete successfully, but when I call the solve routine, I get "p4_error: interrupt SIGFPE: 8", which I understand is a Floating Point Exception.

I know that the banded 'A' matrix and the right hand side vector 'b' are identical to the serial version, because in order to debug I am running the parallel version on just one processor at the moment. The only difference between the matrices in memory is that the parallel version has more fill-in space above the entries (as required).

I noticed in the documentation that PDGBTRF/S describe the A matrix as "N-by-N real banded", but the corresponding serial routines DGBTRF/S describe A as "M-by-N band matrix".

Why does the parallel version require the A matrix (which is banded) to be square? Is this the reason why my factorise and solve works fine with LAPACK, but not SCALAPACK?

Any help much appreciated,

Ashton Peters
ashtonaut

Posts: 13
Joined: Thu Jan 27, 2005 6:53 pm

PDGBTRF is one of the routines whose algorithm does not follow
and does not yet have the full functionality of its LAPACK "equivalent".
Currently, PDGBTRF is implemented only for certain N-by-N real banded
matrices, and as you observed in your previous "Question regarding
PDGBTRF" the current implementation (for example) requires that the
local submatrices are nonsingular. If you are trying to solve M-by-N
problem with M != N the routine will generate an error.

If the above mentioned exceptions are not the case the method should
work and for completeness, here is a very simple example of using
PDGBTRF and PDGBTRS
Code: Select all
`      PROGRAM PDGBTRF_EXAMPLE**     This is an example of using PDGBTRF and PDGBTRS.*     A matrix of size 9x9 is distributed on a 1x3 process *     grid, factored, and solved in parallel.*     *      INTEGER          ICTXT, INFO, MYCOL, MYROW, NPCOL, NPROW      INTEGER          BWL, BWU, N, NB, LAF, LWORK      PARAMETER       (BWL = 1, BWU = 1, NB = 3, N = 9)      PARAMETER       (LAF=(NB+BWU)*(BWL+BWU)+6*(BWL+BWU)*(BWL+2*BWU))      PARAMETER       (LWORK = LAF)      INTEGER          DESCA(7), DESCB(7), IPIV(NB), FILL_IN      DOUBLE PRECISION A(1+2*BWL+2*BWU,NB),B(NB),AF(LAF),WORK(LWORK)*     *     INITIALIZE THE PROCESS GRID*      NPROW = 1      NPCOL = 3      CALL SL_INIT( ICTXT, NPROW, NPCOL )      CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL )**     DISTRIBUTE THE MATRIX ON THE PROCESS GRID*     Initialize the array descriptors for the matrices A and B*      DESCA( 1 ) = 501                   ! descriptor type      DESCA( 2 ) = ICTXT                 ! BLACS process grid handle      DESCA( 3 ) = N                     ! number of rows in A      DESCA( 4 ) = NB                    ! Blocking factor of the distribution      DESCA( 5 ) = 0                     ! size of block rows       DESCA( 6 ) = 1+2*BWL+2*BWU         ! leading dimension of A      DESCA( 7 ) = 0                     ! process row for 1st row of A      DESCB( 1 ) = 502                   ! descriptor type      DESCB( 2 ) = ICTXT                 ! BLACS process grid handle      DESCB( 3 ) = N                     ! number of rows in B      DESCB( 4 ) = NB                    ! Blocking factor of the distribution      DESCB( 5 ) = 0                     ! size of block rows       DESCB( 6 ) = NB                    ! leading dimension of B      DESCB( 7 ) = 0                     ! process row for 1st row of B**     Generate matrices A and B and distribute them to the process grid**           P1          P2           P3         X        B**      / 1   1     .           .           \  / 1 \    / 2 \*      | 1   1   1 .           .            | | 1 |    | 3 |*      |     1   1 . 2         .            | | 1 |    | 4 |*      |         1 . 2   2     .            | | 1 |    | 5 |*      |           . 2   2   2 .            | | 1 | =  | 6 |*      |           .     2   2 . 3          | | 1 |    | 7 |*      |           .         2 . 3   3      | | 1 |    | 8 |*      |           .           . 3   3   3  | | 1 |    | 9 |*      \           .           .     3   3 /  \ 1 /    \ 6 /*      FILL_IN = BWL+BWU      IF (MYCOL == 0) THEN         A(2+FILL_IN,1) = 1.d0         A(3+FILL_IN,1) = 1.d0         A(1+FILL_IN,2) = 1.d0         A(2+FILL_IN,2) = 1.d0         A(3+FILL_IN,2) = 1.d0         A(1+FILL_IN,3) = 1.d0         A(2+FILL_IN,3) = 1.d0         A(3+FILL_IN,3) = 1.d0         B(1) = 2         B(2) = 3         B(3) = 4      ELSE IF (MYCOL == 1) THEN         A(1+FILL_IN,1) = 2.d0         A(2+FILL_IN,1) = 2.d0         A(3+FILL_IN,1) = 2.d0         A(1+FILL_IN,2) = 2.d0         A(2+FILL_IN,2) = 2.d0         A(3+FILL_IN,2) = 2.d0         A(1+FILL_IN,3) = 2.d0         A(2+FILL_IN,3) = 2.d0         A(3+FILL_IN,3) = 2.d0         B(1) = 5         B(2) = 6         B(3) = 7      ELSE IF (MYCOL == 2) THEN           A(1+FILL_IN,1) = 3.d0         A(2+FILL_IN,1) = 3.d0         A(3+FILL_IN,1) = 3.d0         A(1+FILL_IN,2) = 3.d0         A(2+FILL_IN,2) = 3.d0         A(3+FILL_IN,2) = 3.d0         A(1+FILL_IN,3) = 3.d0         A(2+FILL_IN,3) = 3.d0         B(1) = 8         B(2) = 9         B(3) = 6      END IF**     Perform LU factorization*      CALL PDGBTRF( N, BWL, BWU, A(1,1), 1, DESCA, IPIV,     \$     AF, LAF, WORK, LWORK, INFO )      IF (INFO/=0) THEN         write(*,*) 'Info flag from PDGBTRF = ',INFO, ', Col = ',MYCOL         GOTO 100       END IF**     Solve using the LU factorization from PDGBTRF*      CALL PDGBTRS('N', N, BWL, BWU, 1, A(1,1), 1, DESCA, IPIV,      \$     B(1), 1, DESCB, AF, LAF, WORK, LWORK, INFO)       IF (INFO/=0) THEN         write(*,*) 'Info flag from PDGBTRS = ',INFO, ', Col = ',MYCOL         GOTO 100      END IF      write(*,200) 'X(',3*MYCOL+1,':',3*MYCOL+NB,') = ',B 200  format((a2,I3,a1,I3,a4,3(f10.2,2x)))**     Release the process grid and exit BLACS* 100  CALL BLACS_GRIDEXIT( ICTXT )      CALL BLACS_EXIT( 0 )      STOP      END`
Stan Tomov

Posts: 13
Joined: Thu Dec 09, 2004 1:28 pm

Firstly, thank you Stan for the easy to follow reply, I am relatively new to LAPACK, so your sample code is of great assistance.

I think I am a little confused as to the definition of the banded A matrix being 'NxN' in PDGBTRF/S. Does this mean the original A matrix (before it gets stored in banded form) is square, or does it mean when stored in banded form it must be square? I think you mean the former, but I was a little confused by the documentation.

If you do indeed mean the former, then I think my problem should be able to be solved. I am solving a finite element system A*x=b, where A is originally a square (NxN) matrix but is stored in banded form (bwl=bwu~=80, N~=3000), and b is a vector of length N. Therefore I think (please confirm) that I am indeed solving with a 'NxN' A matrix.

I will check my code against your sample code to try and track down where I am going wrong.

Thanks again for yout excellent reply.

Ashton
ashtonaut

Posts: 13
Joined: Thu Jan 27, 2005 6:53 pm

Stan, I found an interesting similarity - when I compile and run your sample code I get the same errors I get with my finite element code.

The compile statement is

\$ pgf90 -Mscalapack -C -o testpdgbtrfs.opt testpdgbtrfs.f

and the run command is

\$ mpirun -np 3 testpdgbtrfs.opt

I get the following

p0_31312: p4_error: interrupt SIGFPE: 8
Killed by signal 2.
Killed by signal 2.
/usr/pgi/linux86-64/5.2/bin/mpirun: line 1: 31312 Broken pipe /home/ape20/fwdsolvers/testing/testpdgbtrfs.opt -p4pg /home/ape20/fwdsolvers/testing/PI31194 -p4wd /home/ape20/fwdsolvers/testing

I have narrowed the problem down to the PDGBTRS call i.e. it is this call that causes these error messages in both your sample code and my own code.

Please advise if you think that this is perhaps a SCALAPACK bug, or alternatively if there is something in my compile and/or run commands that is incorrect.

FYI, our system is as follows:

Ten node Beowulf cluster
Dual AMD Opteron 244 Processors, 2GB ram per node
Rocks Linux 3.3
Portland Group PGI CDK 5.2

Ashton
ashtonaut

Posts: 13
Joined: Thu Jan 27, 2005 6:53 pm

Hi Ashton,
Yes, the original matrix A has to be square (NxN). The
local are in packed format of size DESCA( 6 ) x DESCA( 4 ).
You can not have varying DESCA(4) for the different processors.
If for example N = 10 and you distribute among 3 processors (see the
example below), NB = DESCA(4) should be 4. The "first" 2 processors
will have 4 columns of A each, and the last one will have 2.

I compile and run the example without the problems that you have.
I link with scalapack, blas, atlas, mpi, and several blacs libraries.
The easiest way to figure out how to compile is to go to
your main scalapack directory and do
make exe
This will create your testing routines. They should work and
if they do, link your program as done in creating the testing
routines. I hope this will work.
Below is one more example.
Stan

P.S. I compile and run with something like
ifort -o testing PDGBTRF_test2.f libscalapack.a blacsF77init_MPI-LINUX-0.a blacs_MPI-LINUX-0.a blacsF77init_MPI-LINUX-0.a libf77blas.a libmpich.a
mpirun -np 3 testing

Code: Select all
`      PROGRAM PDGBTRF_EXAMPLE**     This is an example of using PDGBTRF and PDGBTRS.*     A matrix of size 10x10 is distributed on a 1x3 process *     grid, factored, and solved in parallel.*     *      INTEGER          ICTXT, INFO, MYCOL, MYROW, NPCOL, NPROW      INTEGER          BWL, BWU, N, NB, LAF, LWORK      PARAMETER       (BWL = 1, BWU = 1, NB = 4, N = 10)      PARAMETER       (LAF=(NB+BWU)*(BWL+BWU)+6*(BWL+BWU)*(BWL+2*BWU))      PARAMETER       (LWORK = LAF)      INTEGER          DESCA(7), DESCB(7), IPIV(NB), FILL_IN      DOUBLE PRECISION A(1+2*BWL+2*BWU,NB),B(NB),AF(LAF),WORK(LWORK)*     *     INITIALIZE THE PROCESS GRID*      NPROW = 1      NPCOL = 3      CALL SL_INIT( ICTXT, NPROW, NPCOL )      CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL )**     DISTRIBUTE THE MATRIX ON THE PROCESS GRID*     Initialize the array descriptors for the matrices A and B*      DESCA( 1 ) = 501                   ! descriptor type      DESCA( 2 ) = ICTXT                 ! BLACS process grid handle      DESCA( 3 ) = N                     ! number of rows in A      DESCA( 4 ) = NB                    ! Blocking factor of the distribution      DESCA( 5 ) = 0                     ! size of block rows       DESCA( 6 ) = 1+2*BWL+2*BWU         ! leading dimension of A      DESCA( 7 ) = 0                     ! process row for 1st row of A      DESCB( 1 ) = 502                   ! descriptor type      DESCB( 2 ) = ICTXT                 ! BLACS process grid handle      DESCB( 3 ) = N                     ! number of rows in B      DESCB( 4 ) = NB                    ! Blocking factor of the distribution      DESCB( 5 ) = 0                     ! size of block rows       DESCB( 6 ) = NB                    ! leading dimension of B      DESCB( 7 ) = 0                     ! process row for 1st row of B**     Generate matrices A and B and distribute them to the process grid**              P1           P2           P3         X        B**      / 1   1         .               .      \  / 1 \    / 2 \*      | 1   1   1     .               .       | | 1 |    | 3 |*      |     1   1   2 .               .       | | 1 |    | 4 |*      |         1   2 . 2             .       | | 1 |    | 5 |*      |             2 . 2   2         .       | | 1 | =  | 6 |*      |               . 2   2   2     .       | | 1 |    | 6 |*      |               .     2   2   3 .       | | 1 |    | 7 |*      |               .         2   3 . 3     | | 1 |    | 8 |*      |               .             3 . 3   3 | | 1 |    | 9 |*      \               .               . 3   3 / \ 1 /    \ 6 /*                                         *      FILL_IN = BWL+BWU      IF (MYCOL == 0) THEN         A(2+FILL_IN,1) = 1.d0         A(3+FILL_IN,1) = 1.d0         A(1+FILL_IN,2) = 1.d0         A(2+FILL_IN,2) = 1.d0         A(3+FILL_IN,2) = 1.d0         A(1+FILL_IN,3) = 1.d0         A(2+FILL_IN,3) = 1.d0         A(3+FILL_IN,3) = 1.d0         A(1+FILL_IN,4) = 2.d0         A(2+FILL_IN,4) = 2.d0         A(3+FILL_IN,4) = 2.d0         B(1) = 2.d0         B(2) = 3.d0         B(3) = 4.d0         B(4) = 5.d0      ELSE IF (MYCOL == 1) THEN         A(1+FILL_IN,1) = 2.d0         A(2+FILL_IN,1) = 2.d0         A(3+FILL_IN,1) = 2.d0         A(1+FILL_IN,2) = 2.d0         A(2+FILL_IN,2) = 2.d0         A(3+FILL_IN,2) = 2.d0         A(1+FILL_IN,3) = 2.d0         A(2+FILL_IN,3) = 2.d0         A(3+FILL_IN,3) = 2.d0         A(1+FILL_IN,4) = 3.d0         A(2+FILL_IN,4) = 3.d0         A(3+FILL_IN,4) = 3.d0         B(1) = 6.d0         B(2) = 6.d0         B(3) = 7.d0         B(4) = 8.d0      ELSE IF (MYCOL == 2) THEN           A(1+FILL_IN,1) = 3.d0         A(2+FILL_IN,1) = 3.d0         A(3+FILL_IN,1) = 3.d0         A(1+FILL_IN,2) = 3.d0         A(2+FILL_IN,2) = 3.d0         B(1) = 9.d0         B(2) = 6.d0      END IF**     Perform LU factorization*      CALL PDGBTRF( N, BWL, BWU, A(1,1), 1, DESCA, IPIV,     \$     AF, LAF, WORK, LWORK, INFO )      IF (INFO/=0) THEN         write(*,*) 'Info flag from PDGBTRF = ',INFO, ', Col = ',MYCOL         GOTO 100       END IF**     Solve using the LU factorization from PDGBTRF*      CALL PDGBTRS('N', N, BWL, BWU, 1, A(1,1), 1, DESCA, IPIV,      \$     B(1), 1, DESCB, AF, LAF, WORK, LWORK, INFO)       IF (INFO/=0) THEN         write(*,*) 'Info flag from PDGBTRS = ',INFO, ', Col = ',MYCOL         GOTO 100      END IF      write(*,200) 'X(processor ',MYCOL,') = ',B(1:DESCB(4)) 200  format((a12,I3,a4,4(f10.2,2x)))**     Release the process grid and exit BLACS* 100  CALL BLACS_GRIDEXIT( ICTXT )      CALL BLACS_EXIT( 0 )      STOP      END`
Last edited by Stan Tomov on Thu Feb 10, 2005 3:54 pm, edited 2 times in total.
Stan Tomov

Posts: 13
Joined: Thu Dec 09, 2004 1:28 pm

I have sent a support request to the Portland Group about this problem, maybe I am doing something wrong when compiling, or maybe there is a problem with the PGI compilers.

If anyone reading this thread has Portland Group compilers, would they mind compiling and running Stan's example code to test if it works? If possible, use the same compile statement as I did.

The PGI User's Guide says that when I use the option -Mscalapack, the following libraries are linked:

scalapack.a
blacsCinit_MPI-LINUX-0.a
blacs_MPI-LINUX-0.a
blacsF77init_MPI-LINUX-0.a
libblas.a
libmpich.a

Stan, do you see that there is anything missing here that might cause the problem I am having?

Thanks again,

Ashton
ashtonaut

Posts: 13
Joined: Thu Jan 27, 2005 6:53 pm

update

A brief update...

I (finally) got a reply from PGI:

Here is what we currently have determined.

The actual failure is caused by a divide by zero in the MOD function
at line 542 of the scalapack file "pdgbtrs.f". However, the variable
NB should not be zero here. I appears that one of the three processes
is mangling the values of the DESCA array in the PDGBTRF, even though
DESCA is not referenced in the part of the code where mangling occurs.

This smells like an array stepping on another piece of data, or
a compiler bug, but that seems unlikely when compiling -O0 or -g
still causes it. The error still occurs.

We are continuing to investigate.

They seem to think it is either a bug in the compiler or bad code.

I will update this thread when I get a resolution.

Ashton
ashtonaut

Posts: 13
Joined: Thu Jan 27, 2005 6:53 pm

Just a quick update: We have received no further response from PGI, and are no longer using their compilers.
ashtonaut

Posts: 13
Joined: Thu Jan 27, 2005 6:53 pm

Stan,

I am now testing the first sample code you provided (9x9 matrix) with the g77 compilers, using the ACML 2.5 math libraries (including ACML Scalapack). I am hoping that it will work where the PGI compilers did not.

However, when I compile and run your code (copied verbatim) I get the following error:

% mpirun -np 3 testpdgbtrsG77.opt

{ 0, 0}: On entry to PDGBTRS parameter number 1101 had an illegal value
Info flag from PDGBTRS = -1101, Col = 0
{ 0, 1}: On entry to PDGBTRS parameter number 1101 had an illegal value
Info flag from PDGBTRS = -1101, Col = 1
{ 0, 2}: On entry to PDGBTRS parameter number 1101 had an illegal value
Info flag from PDGBTRS = -1101, Col = 2

I would assume that there is nothing wrong with your code, so can you suggest what might be causing this error code?

Thanks,

Ashton
ashtonaut

Posts: 13
Joined: Thu Jan 27, 2005 6:53 pm