Hello Jill,
I am glad that you find 1.8.0 better than 1.7.0, the list of changes is
at: http://www.netlib.org/scalapack/scalapack1.8.0.changes .
It was quite subtantial although most of the changes go in the bug fixes
which is hard to valorize.
Thanks for the bug report for pdgecon.f. I believe you are right. When the
local number of rows is zero, then NPMOD is 0 (line 231), then LIWORK is 0
(line 241), then if you allocate IWORK of size LIWORK (that is 0), you are
going to be in trouble. For example at line 242.
I did not test this. I do not have a working cluster anymore. (Our 10
yearold Penitum III cluster officially died during Summer break!)
Can you change line 241 of pdgecon.f with
LIWMIN = MAX( 1, NPMOD )
and confirm that this does the job. I believe the worskpace computation
for WORK is OK.
If you confirm success, we will patch our version.
(( Action:
* change line 241 in PDGECON
* change comments of LIWORK ( LIWORK needs to be at least 1)
* report the changes in C,Z,S variants
))
If you can not work on this, that's fine. We will work this out from our
side.
Thanks for the bug report in any case.
Best wishes,
julien.
On Fri, 28 Aug 2009, Jill Reese wrote:
Hello Julien.? My name is Jill Reese, and I am a developer with MATLAB?s
Parallel Computing Toolbox.? At our last teleconference with the
ScaLAPACK team I was asked to provide some examples of when we have had
to work around ScaLAPACK?s handling of small matrices. ?One issue that
has appeared in several functions (PDGECON, and older versions of PDGESVD
certainly) is that failure occurs when we ask ScaLAPACK? to work with a
matrix that has been distributed such that the process grid has rows with
no data.
?
For example, consider a matrix distributed using the default block size.
?I want to call PDGECON to calculate the reciprocal condition number, so
I first call the function with LWORK=1 and LIWORK=1 to perform a
workspace query.? Then using the output stored in WORK and IWORK, I
create my workspace arrays and call PDGECON with those arrays to do the
actual calculation.?
?
The actual condition number calculation segfaults whenever there are rows
in the process grid with no data.? As an example, consider the following
study regarding computing the condition number of an N x N identity
matrix distributed using the default block size.? The results are
independent of the process grid orientation.
?
A.????? One Processor: no error independent of N
B.????? Two Processors
a.?????? 2 x 1 process grid
????????????????????????????????????????????????????? i.????? SEGFAULT
for N = 64
??????????????????????????????????????????????????? ii.????? No error for
N = 65
b.????? 1 x 2 process grid: no error for N = 64, 65
C.????? Three Processors
a.?????? 3 x 1 process grid
????????????????????????????????????????????????????? i.????? SEGFAULT
for N = 64, 65, 128
??????????????????????????????????????????????????? ii.????? No error for
N = 129
b.????? 1 x 3 process grid: no error for N = 64, 65, 128, 129
D.????? Four Processors
a.?????? 4 x 1 process grid
????????????????????????????????????????????????????? i.????? SEGFAULT
for N = 64, 65, 128, 129, 192
??????????????????????????????????????????????????? ii.????? No error for
N = 193
b.????? 1 x 4 process grid: no error for N = 64, 65, 128, 129, 192, 193
c.?????? 2 x 2 process grid:
????????????????????????????????????????????????????? i.????? SEGFAULT
for N = 64
??????????????????????????????????????????????????? ii.????? No error for
N = 65
?
As you can see from the above, ScaLAPACK has no problem handling the
situation where a column of the processor grid has no data.? It is when a
row in the process grid has no data that we see a problem.
?
A workaround we have found is:
(1)??? Perform a workspace query using PDGECON
(2)??? Adjust the size of the workspace to be max(1, result of query)
(3)??? Compute the condition number by calling PDGECON with the adjusted
workspace
?
As an aside, we saw the same problem in PDGESVD in ScaLAPACK version 1.7,
but the problem seems to be corrected for that function in version 1.8
(although I don?t know where in the code).
?
Please feel free to contact me for any clarification.
?
Regards,
?
Jill
?
?

Jill Reese, Ph.D.
Senior Software Engineer
The MathWorks, Inc.
(508) 6473907
?
