ScaLAPACK Archives

[Scalapack] Bug in BLACS with return datatypes

Correct. Reference BLAS And Lapack (and goto2) works w/o errors. MKL generates 
the following errors in c/zladiv. 


22957   P       :             1     2     1     4
22958   Q       :             1     2     4     1
22959 
22960 Relative machine precision (eps) is taken to be       0.596046E-07
22961 Routines pass computational tests if scaled residual is less than   
5.0000    
22962 
22963 QR factorization tests.
22964 
22965 TIME      M      N  MB  NB     P     Q Fact Time      MFLOPS  CHECK  
Residual
22966 ---- ------ ------ --- --- ----- ----- --------- ----------- ------  
--------
22967 
22968 [dancer00:18681] *** Process received signal ***
22969 [dancer00:18681] Signal: Segmentation fault (11)
22970 [dancer00:18681] Signal code: Address not mapped (1)
22971 [dancer00:18681] Failing at address: 0xbf2b1b60
22972 [dancer00:18681] [ 0] /lib64/libpthread.so.0(+0xf500) [0x7f0a78357500]
22973 [dancer00:18681] [ 1] 
/opt/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so(mkl_lapack_cladiv+0x11)
 [0x7f0a78bb9c41]
22974 [dancer00:18681] [ 2] ./xcqr(pclarfg_+0x8d9) [0x438681]
22975 [dancer00:18681] [ 3] ./xcqr(pcgeqr2_+0x91a) [0x46c0be]
22976 [dancer00:18681] [ 4] ./xcqr(pcgeqrf_+0x576) [0x42f902]
22977 [dancer00:18681] [ 5] ./xcqr(MAIN__+0x25d9) [0x407ded]
22978 [dancer00:18681] [ 6] ./xcqr(main+0x2a) [0x476dea]
22979 [dancer00:18681] [ 7] /lib64/libc.so.6(__libc_start_main+0xfd) 
[0x7f0a770b1cdd]
22980 [dancer00:18681] [ 8] ./xcqr() [0x405759]
22981 [dancer00:18681] *** End of error message ***
22982 --------------------------------------------------------------------------
22983 mpiexec noticed that process rank 0 with PID 18681 on node d00 exited on 
signal 11 (Segmentation fault).
22984 --------------------------------------------------------------------------


The Schurr decomposition fails in a different way: As soon as number of 
processes >1
WALL     2   6    1    1     0.00     1.53 FAILED
||H - Q*S*Q^T|| / (||H|| * N * eps) =              2347763.




Le 13 ao?t 2013 ? 12:06, "Langou, Julien" <Julien.Langou@Domain.Removed> a 
?crit :


Bonjour Aurelien, so you say that if you link with LAPACK compiled from
scratch, this works, if you use MKL, this does not work. Am I correct?
Julien.


On 8/9/13 4:57 PM, "Aur?lien Bouteiller" <bouteill@Domain.Removed> wrote:


I get the following issues on both mvapich and open MPI:
(compiled with gcc+mkl 13.1)
LAPACK supplied by user is WORKING, will use
-L/opt/intel/composer_xe_2013.3.163/mkl -lmkl_lapack95_lp64
-lmkl_gnu_thread -lmkl_gf_lp64 -lmkl_core -liomp5 -lpthread -lm.

The following tests FAILED:
      61 - xcqr (Failed)
      62 - xzqr (Failed)
      65 - xcbrd (Failed)
      66 - xzbrd (Failed)
      69 - xchrd (Failed)
      70 - xzhrd (Failed)
      73 - xctrd (Failed)
      74 - xztrd (Failed)
      79 - xcsep (Failed)
      80 - xzsep (Failed)
      83 - xcgsep (Failed)
      84 - xzgsep (Failed)
      93 - xcheevr (Failed)
      94 - xzheevr (Failed)
Errors while running CTest

Not all z/c precision test fail, but only complex test fail. Is this
something known about?


Le 30 juil. 2013 ? 10:47, "Langou, Julien" <Julien.Langou@Domain.Removed> a
?crit :


Merci Aurelien, je pense que on a tourne autour de ce bug pendant un
petit
moment sans jamais le trouver. Effectivement, mes codes qui utilisent
Cblacs2sys_handle marchent, ne marchent pas. Ca d?pend. Merci beaucoup.
J'espere que cela fixera mes codes. Cordialement, Julien.


On 7/29/13 1:43 PM, "Aur?lien Bouteiller" <bouteill@Domain.Removed> wrote:

Hey Julie :) 

The following patch solves the bug with Open MPI. It is following the
"legacy" practice of redeclaring function prototypes every time they
are
used in another .c file.


Index: BLACS/SRC/blacs_map_.c
===================================================================
--- BLACS/SRC/blacs_map_.c (revision 193)
+++ BLACS/SRC/blacs_map_.c (working copy)
@@ -7,7 +7,7 @@
                         int *npcol0)
#endif
{
-
+   MPI_Comm Cblacs2sys_handle(int BlacsCtxt);
 MPI_Comm BI_TransUserComm(int, int, int *);

 int info, i, j, Iam, *iptr;


Aurelien 



Le 29 juil. 2013 ? 12:32, julie langou <julie@Domain.Removed> a ?crit :

Thank you very much Aurelien for reporting the problem.
Solution 3 seems to be the most appropriate indeed.We are going to add
it to our "to-do" list.
If you could me send the patch, I would apply it in our repository as
a
temporary fix
Regards,
Julie
On Jul 25, 2013, at 7:44 AM, Aur?lien Bouteiller
<bouteill@Domain.Removed>
wrote:

Hey guys, 

I think I found an original 1996 bug :)

The following functions in BLACS return a "non integer" datatype. Now
the problem is that BLACS doesn't have a .h file, so basically all
functions are automatically inferred to be "int func(int)". This has
worked in the past because most of the time the returned datatype was
int compatible. This is not the case anymore (in particular in Open
MPI, MPI_Comm is a pointer (64bit) and therefore doesn't fit in a
32bit
int.

MPI_Datatype GetMpiGetType()
MPI_Datatype GetMpiGetType()
MPI_Comm Cblacs2Sys_handle()
double Cdwalltime00()

There are several possible course of action from there.
1. Have all functions have a prototype in Bdef.h, that would solve
all
the bugs internal to Scalapack, but would still leave users
vulnerable.
2. Change the prototypes so that instead of return values, they are
passed as pointer arguments. Could break backward compatibility.
3. Have Blacks/Scalapack show a full .h API file to users (in a
sense,
a public Bdef.h)


I have done the Bdef.h fix. If you are interested I can contribute
it.

Aurelien 

--
* Dr. Aur?lien Bouteiller
* Researcher at Innovative Computing Laboratory
* University of Tennessee
* 1122 Volunteer Boulevard, suite 309b
* Knoxville, TN 37996
* 865 974 9375







_______________________________________________
Scalapack mailing list
Scalapack@Domain.Removed
http://lists.eecs.utk.edu/mailman/listinfo/scalapack


--
* Dr. Aur?lien Bouteiller
* Researcher at Innovative Computing Laboratory
* University of Tennessee
* 1122 Volunteer Boulevard, suite 309b
* Knoxville, TN 37996
* 865 974 9375







_______________________________________________
Scalapack mailing list
Scalapack@Domain.Removed
http://lists.eecs.utk.edu/mailman/listinfo/scalapack


--
* Dr. Aur?lien Bouteiller
* Researcher at Innovative Computing Laboratory
* University of Tennessee
* 1122 Volunteer Boulevard, suite 309b
* Knoxville, TN 37996
* 865 974 9375









--
* Dr. Aur?lien Bouteiller
* Researcher at Innovative Computing Laboratory
* University of Tennessee
* 1122 Volunteer Boulevard, suite 309b
* Knoxville, TN 37996
* 865 974 9375








<Prev in Thread] Current Thread [Next in Thread>


For additional information you may use the LAPACK/ScaLAPACK Forum.
Or one of the mailing lists, or