ZHEEVD_GPU failes to converge

Open discussion for MAGMA

ZHEEVD_GPU failes to converge

Postby Paul2822 » Sat Sep 14, 2013 2:24 pm

Hi all,

I'm running into some issues with the magma_zheevd_gpu function.
I have two test matrizies and zheevd is converging for one but not for the other.

The error that I receive is 3394 which - according to the documentation - means that it was not able to compute the first 199 eigenvalues.

The matrix for which zheevd is not converging is of size 3194.
I investigated this problem in little more detail and realized that MAGMA is calling zheev of LAPACK if N is <= 128, so I changed this heuristic to N <= 4000 just to see if this problem
is due to my matrix or if it's related to the zheevd of MAGMA.
As it turns out, LAPACK does not have this issue and it is able to generate the correct (I check these values against my original CPU version) eigenvalues and eigenvectors.

Is this an MAGMA issue and if so, are you aware of similar problems?
Is there an easy way to fix this?

Thanks,
Paul
Paul2822
 
Posts: 3
Joined: Sat Sep 14, 2013 1:57 pm

Re: ZHEEVD_GPU failes to converge

Postby Paul2822 » Thu Sep 19, 2013 4:33 pm

Hi all,

I narrowed the problem down to the magma_stedx call within the zheevd_gpu.cpp file.

More precicely, I modified the magma_zheevd function to use lapackf77_zstemr instead of magma_zstedx.
This modification suffices to converge to the right solution.

What is confusing me, hower, is that I'm not able to build a selfcontained example of this bug (i.e. a separate binary which reads all the input arguments from a file).
My selfcontained example allways converges for exactly the same input arguments and by using the original MAGMA version. However,
within our application the original MAGMA version is never converging.

I'm also linking my selfcontained binary against some (not all) of the libraries which we are linking against in our application as well.
Can you think of any other problems that I'm running into?

Do you think that this is a MAGMA issue or an issue with the Bisection Algorithm?

Best,
Paul
Last edited by Paul2822 on Tue Sep 24, 2013 2:01 pm, edited 1 time in total.
Paul2822
 
Posts: 3
Joined: Sat Sep 14, 2013 1:57 pm

Re: ZHEEVD_GPU failes to converge

Postby mgates3 » Mon Sep 23, 2013 12:12 pm

Thanks for the work hunting down where the problem occurs. It's possible there's some synchronization issue/race condition that occurs in your application, but not in the standalone example. Can you give specifics about what options you call magma_zheevd_gpu with? Does using a different routine such as magma_zheevd (CPU interface) or magma_zheevdx exhibit the same problem?

Also, would it be possible to get a copy of your matrix? Though this might be of limited value if we can't reproduce the problem in a standalone manner.
-mark
mgates3
 
Posts: 399
Joined: Fri Jan 06, 2012 2:13 pm

Re: ZHEEVD_GPU failes to converge

Postby Paul2822 » Tue Sep 24, 2013 2:01 pm

Hi Mark,

yes, I could provide you with the matrix in binary format but I don't know if this will help you :/

I was also thinking that the context of our application might cause some problems to MAGMA but I think that we can rule this out because I move the zheevd call to the very begining of our main function but the rountine still does not converge.
Moreover, all inputs to the routine are read from the same file (jobz = 'V' & uplo = 'U').

I just tested magma_zheevdx_gpu and it is giving me the same error (i.e. it was not able to converge).

Best,
Paul
Paul2822
 
Posts: 3
Joined: Sat Sep 14, 2013 1:57 pm

Re: ZHEEVD_GPU failes to converge

Postby mgates3 » Thu Sep 26, 2013 2:31 pm

What LAPACK (e.g., MKL) are you linking with? Can you post your make.inc file? Are there any differences between how MAGMA links with MKL and how your application links with MKL?
-mark
mgates3
 
Posts: 399
Joined: Fri Jan 06, 2012 2:13 pm


Return to User discussion

Who is online

Users browsing this forum: No registered users and 2 guests

cron