Bug report of memory leakage for Scalapack routine PCGEMR2D

Open discussion regarding features, bugs, issues, vendors, etc.

Bug report of memory leakage for Scalapack routine PCGEMR2D

Postby phymilton » Wed Jun 21, 2006 11:26 am

Bug report of memory leakage for Scalapack subroutine PCGEMR2D (or my wrong usage). I also tried PZGEMR2D, identical problem occurs

At my program, I have to use a lot of PCGEMR2D to distribute and re-distribute between global matrixes. It is OK to use a few of PCGEMR2D, but when I use too much of PCGEMR2D, the program just crashed. Try to point out the problem is really at PCGEMR2D, I wrote a very simple test program with only PCGEMR2D called in the test program (the source code is attached).

In the test program, I set up several pause points, and read the memory usage by the “top” command of Linux. I can see the memory usage will accumulated when more and more PCGEMR2D is called.

If memory is accumulated large enough, the program will crash with the following error message:

[2] Abort: VAPI_register_mr at line 65 in file collutils.c
Timeout alarm signaled
Cleaning up all processes ...forrtl: error (78): process killed (SIGTERM)
forrtl: error (78): process killed (SIGTERM)
forrtl: error (78): process killed (SIGTERM)
done.


In my test program, the program crashed after enter 3 integers which corresponding to total number of loop between 1500 and 2000.

My computer setup is as following:

Intel Xeon EM64t process
2GB memory
Intel fortran compiler V8.01
MPICH
Infiniband communcation network
Intel Math Kernel Library Cluster version 8.0


Source code:

Code: Select all
          implicit none

            integer, parameter :: M = 324, N = 162, mb = 64, nb = mb
            integer, parameter :: rsrc = 0, csrc = 0,  dlen_ = 9

            integer :: ictxt, rank, tproc, icroot
            integer :: prow, pcol, nprow, npcol, myrow, mycol
 
            integer, external ::  numroc

            integer :: mxlocr_M, mxlocc_M, mxlld_M, desc_M(dlen_)
            integer :: mxlocr_N, mxlocc_N, mxlld_N, desc_N(dlen_)
            complex, allocatable :: A(:,:), A1(:,:), A2(:,:),A3(:,:),A4(:,:)
            real, allocatable :: B(:,:)
           
            integer :: i, ppp, info
           
           
            call blacs_pinfo(rank, tproc)
            call blacs_get(-1,0,ictxt)
            nprow = int(sqrt(real(tproc+0.0001)))
            npcol = tproc/nprow
 
            call blacs_gridinit(ictxt,'r',nprow, npcol)
            call blacs_gridinfo(ictxt,nprow,npcol,myrow,mycol)
            print *, 'rank',rank, 'myrow', myrow, 'mycol',mycol
           
           
            mxlocr_M = numroc(M,nb,myrow,rsrc,nprow)
            mxlocc_M = numroc(M,nb,mycol,rsrc,npcol)
            mxlld_M = max(1,mxlocr_M)       
            call descinit(desc_M,M,M,nb,nb,rsrc,rsrc,ictxt,mxlld_M,info)
            if (info .NE. 0 ) then
                   print *, 'descinit error, info =',info
                   stop
            endif                         
       
            mxlocr_N = numroc(N,nb,myrow,rsrc,nprow)
            mxlocc_N = numroc(N,nb,mycol,rsrc,npcol)
            mxlld_N = max(1,mxlocr_N)             
            call descinit(desc_N,N,N,nb,nb,rsrc,rsrc,ictxt,mxlld_N,info)
            if (info .NE. 0 ) then
                     print *, 'descinit error, info =',info
                     stop
            endif 
           
            allocate(A(mxlld_M,mxlocc_M))
            allocate(B(mxlld_M,mxlocc_M))
                       
            allocate(A1(mxlld_N,mxlocc_N))
            allocate(A2(mxlld_N,mxlocc_N))
            allocate(A3(mxlld_N,mxlocc_N))
            allocate(A4(mxlld_N,mxlocc_N))                                   

           
            call random_number(B)
           
            A=cmplx(B,(rank+1)*2.8*B)
           
           
            DO i=1,3000
               CALL PCGEMR2D(N,N,A,1,1,desc_M,A1,1,1,desc_N, ictxt)           
               CALL PCGEMR2D(N,N,A,1+N,1,desc_M,A2,1,1,desc_N, ictxt)
               CALL PCGEMR2D(N,N,A,1,1+N,desc_M,A3,1,1,desc_N, ictxt)
               CALL PCGEMR2D(N,N,A,1+N,1+N,desc_M,A4,1,1,desc_N, ictxt)                                 


               CALL PCGEMR2D(N,N,A1,1,1,desc_N,A,1,1,desc_M, ictxt)           
               CALL PCGEMR2D(N,N,A2,1,1,desc_N,A,1+N,1,desc_M, ictxt)
               CALL PCGEMR2D(N,N,A3,1,1,desc_N,A,1,1+N,desc_M, ictxt)
               CALL PCGEMR2D(N,N,A4,1,1,desc_N,A,1+N,1+N,desc_M, ictxt) 
               
               if (mod(i,500) .eq. 0) then               
                  print *,'interation:',i,A1(10,10)
                  if (rank .eq. 0) then
                     print *, 'Please enter an integer after read the memory by <top> command'
                     read(*,*)ppp
                  endif
                     
                  call blacs_barrier(ictxt, 'A')
               endif
            ENDDO
           
            deallocate(A,A1,A2,A3,A4,B)               
            call blacs_gridexit(ictxt)
            call blacs_exit(0)               
            END
phymilton
 
Posts: 19
Joined: Mon Jan 24, 2005 11:41 pm
Location: Ames, IA

Postby phymilton » Thu Jun 22, 2006 1:48 pm

Is it possilbe due to my computer hardware configuration such as how MPICH is installed or any memory mangement setup? I am not familar with those setup and just use the compiler and the math libraries.

Does scalapack have subroutines to automatically release un-used memory?

Thank you very much!!
phymilton
 
Posts: 19
Joined: Mon Jan 24, 2005 11:41 pm
Location: Ames, IA

Problem solved by using pcgeadd

Postby phymilton » Fri Jun 23, 2006 1:30 pm

From the help of LAPACK/ScaLAPACK developers, the calling of PCGEMR2D can be replaced by pcgeadd if the two matrix have the same context (how the CPU is griding). For example to copy matrix A to matrix B, we can use:


Code: Select all
            alpha = 1.0d0
            beta = 0.0d0

            call pcgeadd('N', m,n,alpha,A,ia,ja,descA,beta,B,ib,jb,descB)


Here ia, ja, ib, jb can be arbitrary and part of the global matrix can be copied or be copy to. As a bonus, the program can be fast by replacing PCGEMR2D

But to solve the problem of PCGEMR2D itself, LAPACK/ScaLAPACK developers suggested use LAM MPI instead of MPICH as the communication library, I am going to test this later.

Many thanks to Eduardo's help and suggestion.
phymilton
 
Posts: 19
Joined: Mon Jan 24, 2005 11:41 pm
Location: Ames, IA

Postby Julie » Mon Jun 26, 2006 12:18 pm

phymilton,

from http://www.lam-mpi.org/
Code: Select all
LAM/MPI is now in a maintenance mode. Bug fixes and critical patches are still being applied, but little real "new" work is happening in LAM/MPI. This is a direct result of the LAM/MPI Team spending the vast majority of their time working on our next-generation MPI implementation : OpenMpi


So maybe, it might be better for you to use OpenMpi. http://www.open-mpi.org/
Open MPI is a project combining technologies and resources from several other projects (FT-MPI, LA-MPI, LAM/MPI, and PACX-MPI) in order to build the best MPI library available.

Sincerely
Julie
Julie
 
Posts: 299
Joined: Wed Feb 23, 2005 12:32 am
Location: ICL, Denver. Colorado

Postby phymilton » Tue Jun 27, 2006 2:20 pm

Thanks a lot, I will try open-mpi.
phymilton
 
Posts: 19
Joined: Mon Jan 24, 2005 11:41 pm
Location: Ames, IA


Return to User Discussion

Who is online

Users browsing this forum: No registered users and 2 guests

cron