It works! I did something like what Ed just suggested. Since I already have the
correct output at each step from the Lapack based code, I looked at how
different from those are the outputs at each step from the scalapck based code.
This helped me track down a few wrong distributions and a couple of
embarrassing typos . Now the difference between final output of the parallel
version and that of the sequential version is ~0.001%, which I can live with .
The above was a neat way of tracking down bugs in an implementation like this.
The tips and suggestions really helped.
For the pivot vector I'm allocating a storage size of (2 X mb) and it works
currently.
Thanks alot guys!!!
Suzy
On Monday, May 11, 2015 11:12 AM, "D'Azevedo, Ed F."
<dazevedoef@Domain.Removed> wrote:
Hi Suzy,
Another simple trick is to initialize the matrix with known values and print
out the matrices at intermediate steps using pclaprnt with different values of
output unit.
For example, using pcelset() to make ?A(i,j) = merge( n, 1, i.eq.j). ?You can
also generate the LU factorization or other matrix computations in matlab to
compare what went wrong and when.
You might also double check the amount of storage for the pivot vector ipivt(:)
?in pcgetrf. ?Note it needs extra MB_A storage.
From: Susini de Silva <susini86@Domain.Removed>
ReplyTo: Susini de Silva <susini86@Domain.Removed>
Date: Saturday, May 9, 2015 at 7:21 AM
To: James Demmel <demmel@Domain.Removed>
Cc: James W DEMMEL <demmel@Domain.Removed>, Nick Knight
<knight@Domain.Removed>, SYD HASHEMI GHERMEZI <sydhashemi@Domain.Removed>,
julie <julie@Domain.Removed>, Orianna DeMasi <ODeMasi@Domain.Removed>,
"scalapack@Domain.Removed" <scalapack@Domain.Removed>, Hong Diep Nguyen
<hdnguyen@Domain.Removed>, Razvan Carbunescu <carazvan@Domain.Removed>, Igor
Kozachenko <igor175@Domain.Removed>
Subject: Re: [Scalapack] Scalapack errorneous output
Hi everyone!
Thanks for the quick responses :)
Ed, Yes I just tried a 1 by 1 proc grid, using the full array dimensions as
block dimensions. Output is still incorrect. I was working on a 2by2 grid
before, so let me go through all calls again in the 1process grid and see where
it goes wrong.To see output during debugging, I have been using mpi calls to
collect the content in child processors to root processor.
The code uses a combination of pcgetrf/pcgetri/pcgemm/pcgemv in 5 subroutines.
I have reduced the size of the problem (~ 450by450 or smaller arrays) for this
testing purpose. But before that I solved a simple 16by16 system using all
these calls, just to make sure I'm using them correctly, and worked well! ??
Jim, thanks for the tip on possible round off errors/algorithm differences.
Hopefully I'll catch if this is the case, during the current test.
I'll get back to you guys on how it goes...
thanks again!Suzy
On Wednesday, May 6, 2015 11:11 AM, James Demmel <demmel@Domain.Removed> wrote:
The reason one may expect different answers from LAPACK and ScaLAPACK
includes the facts that
(1) floating point operations may be done in a different order, due to
parallelism,
and since floating point summation (for example) is not associative, the
rounding
errors will differ, and you'll get different answers, and
(2) in some cases different algorithms are used in LAPACK and ScaLAPACK since
the
most efficient parallel algorithm and most efficient sequential algorithm are
not
necessarily the same.
Depending on the condition number of your problem, either of these may lead to
large or small changes in the output. And of course there may be bugs somewhere
...
Jim Demmel
On 5/6/15 7:54 AM, julie wrote:
Dear Suzy,?You can send it to the mailing list:?scalapack@Domain.Removed?(just
reply to my email )Sincerely,Julie
On May 5, 2015, at 12:42 PM, Susini de Silva <susini86@Domain.Removed> wrote:
Hi!
This is a question regarding implementation of Scalapack on an algorithm that
currently runs correctly using Lapack.
I have been spending a good amount of time editing this very large code that
uses multiple LAPACK calls to use corresponding scalapack calls instead. I'm
making sure the operand arrays are distributed, array descriptors are used
before every call. However the final output of the Scalapack version of my code
turned out to be significantly different from the final output of Lapack
version. So I decided to cross compare the outputs after every Lapack/Scalapack
call. I do see a slight difference. Is this possible? Has anyone seen this
issue before, or do you know why this could happen?
Please, I'm at a deadend at the moment? :shock: , and any comments/suggestions
would be greatly helpful!
I just made an account on Lapack User Forums and tried to post this. However it
was flagged as spam even before I submitted it. I'd be really grateful if you
can help me with this, or direct me to the right resources.
Thanks!Suzy
_______________________________________________
Scalapack mailing list
Scalapack@Domain.Removed
http://lists.eecs.utk.edu/mailman/listinfo/scalapack
_______________________________________________
Scalapack mailing list
Scalapack@Domain.Removedp://lists.eecs.utk.edu/mailman/listinfo/scalapack
 next part 
An HTML attachment was scrubbed...
URL:
<http://lists.eecs.utk.edu/mailman/private/scalapack/attachments/20150513/729c6575/attachment.html>
