Hello Desheng,
yes there seems to have a problem with PxGETRI with rectangular grids when
one uses workspace query to get the optimal size of IWORK. The value
returned in IWORK(1) after a workspace query (LWORK=1) is too small. The
documentation gives the same workspace size as the one returned, so the
documentation of the routine seems to be incorrect as well.
Here is a temporary fix. It will need more testing to be validated but
this works fine on one of our cluster. Let me know if this works as well
for you.
Replace Line 221222 of pdgetri.f
 LIWMIN = NQ + MAX( ICEIL( ICEIL( MP, DESCA( MB_ ) ),
 $ LCM / NPROW ), DESCA( NB_ ) )
with the following seven lines
+ LIWMIN = NUMROC( DESCA( M_ ) + DESCA( MB_ ) * NPROW
+ $ + MOD ( IA  1, MB_P ), DESCA ( NB_ ), MYCOL,
+ $ DESCA( CSRC_ ), NPCOL ) + MAX ( DESCA( MB_ )
+ $ * ICEIL ( ICEIL( NUMROC( DESCA( M_ )
+ $ + DESCA( MB_ ) * NPROW, DESCA( MB_ ), MYROW,
+ $ DESCA( RSRC_ ), NPROW ), DESCA( MB_ ) ),
+ $ LCM / NPROW ), DESCA( NB_ ) )
I have tested matrices of size n=[550], with nb=[13 32] for all the grids
possible with less than 16 processors ( 11, 12, 21, 13, 31, 14, 22,
41, 15, .... , 161), the routine PDGETRI with this modification works
fine, whereas the previous one was only working on square matrices if one
was using the minimum LIWORK. You were right.
So, temporary fix, you compile pdgetri.f with your fortran compiler, link
with scalapack but put pdgetri.o before the libscalapack.a library and
this should work.
We'll take a bit of time here to have a second look and certainly release
a more official patch soon.
Feel free for question, remarks, comments.
Julien
