Hey again,
Thanks for your info. To set the record straight, turning back on the smoothing option improved the solving time, not decrease it, my bad. The simple (didr.cpp) IDR performance (not converging) that I mentioned, was actually due to not having smoothing on. So in my case, smoothing has GREAT beneficial impact on my convergence behavior. But, yes smoothing=0/1 should be handled, imo, by opts not at compilation; omega's value impact is trivial compared to the former, plus my investigation showed really no effect, since its only a starting value I suppose, right?
Regarding your question, I am into the field of CFD, so my matrices are mainly products of NavierStokes equations descretization with finite elements, kepsilon turbulence etc. Ofc, they are sparse matrices, and with different characteristics per generating equation. I am exploring MAGMA solver/preconditioner performance, especially your _merge implementations on various matrices/sizes. So far they seem very promising :) There are some sparse solvers I never heard about, though I am nowhere near a LA expert, like bombardment etc, so I might give those a try too!
IDR perfomance?

 Posts: 90
 Joined: Tue Sep 02, 2014 5:44 pm
Re: IDR perfomance?
>> There are some sparse solvers I never heard about, though I am nowhere near a LA expert, like bombardment etc, so I might give those a try too!
bombardment is not a Krylov solver by itself: it combines a number of Krylov solvers in interleaved fashion  i.e. QMR, CGS, BiCGSTAB. Thie idea is: if I have no idea which solver to use, I run a bunch of them.
Did you look into preconditioning? Does ILU give you good benefits?
Thanks, Hartwig
bombardment is not a Krylov solver by itself: it combines a number of Krylov solvers in interleaved fashion  i.e. QMR, CGS, BiCGSTAB. Thie idea is: if I have no idea which solver to use, I run a bunch of them.
Did you look into preconditioning? Does ILU give you good benefits?
Thanks, Hartwig
Re: IDR perfomance?
Yes I have been looking into preconditioning, actually most of my systems do not converge unless properly preconditioned. Block Jacobi and ILU have given me the best results so far, depending on matrix characteristics. I will run quite a few tests for all possible combos of solver/precond and I can share the results if you want :)
Since my matrices are explicitly available at runtime, my next step would be to try to expand the toolargetofit matrices into > 1 GPUs, since the typical 24GB of Dev RAM is never enough, though I really don't know how that can be implemented in MAGMA. I understand there is OpenMP/pthreads support and some functions have multiGPU equivalents. Have you ever played around with > 1 GPUs? My feeling is since you are spawning different streams for some of your implementations, you are quite on the border of going multiGPU :)
Since my matrices are explicitly available at runtime, my next step would be to try to expand the toolargetofit matrices into > 1 GPUs, since the typical 24GB of Dev RAM is never enough, though I really don't know how that can be implemented in MAGMA. I understand there is OpenMP/pthreads support and some functions have multiGPU equivalents. Have you ever played around with > 1 GPUs? My feeling is since you are spawning different streams for some of your implementations, you are quite on the border of going multiGPU :)

 Posts: 90
 Joined: Tue Sep 02, 2014 5:44 pm
Re: IDR perfomance?
Yes, we actually have some multiGPU code that is not released. It looks like most people only use one GPU, maybe because most of the GPUs nowadays have a good amount of memory  the Tesla line typically 612 GB.
Are the matrices confidential? Otherwise, I would appreciate if you could provide me with one example matrix, then I can also take a look what works. In particular, I am working on some new preconditioning techniques, that may work very well.
Also, do the systems arise in a sequence? Or is it individual systems? If they arise in a sequence, you may have a look into updating an existing ILU preconditioner instead of always generating a new one: http://www.netlib.org/utk/people/JackDo ... LU_GPU.pdf
Are the matrices confidential? Otherwise, I would appreciate if you could provide me with one example matrix, then I can also take a look what works. In particular, I am working on some new preconditioning techniques, that may work very well.
Also, do the systems arise in a sequence? Or is it individual systems? If they arise in a sequence, you may have a look into updating an existing ILU preconditioner instead of always generating a new one: http://www.netlib.org/utk/people/JackDo ... LU_GPU.pdf
Re: IDR perfomance?
Hmm about the matrices will need to ask and get back to you for that one :) If allowed I ll be more than happy to provide.
I think they could arise in a sequence, so I ll have a look at your suggestion! Regarding multiGPU, the thing is that for some matrixfree implementations, or for some matrix dependent but decomposed domains over multiple devices (imagine a cubic domain split in 8 parts, each handled by a single GPU and only the boundaries intercommunicating), multiGPU can actually multiply the performance, plus not everyone can afford a Tesla for the single GPU cases :) The multiGPU code that you mention that's not released, is it because its closed source or due to not enough interest by the community? I suppose I could do the partitioning myself and use standard single GPU MAGMA+OpenMP/MPI and handle the gpu2gpu communication myself.
I think they could arise in a sequence, so I ll have a look at your suggestion! Regarding multiGPU, the thing is that for some matrixfree implementations, or for some matrix dependent but decomposed domains over multiple devices (imagine a cubic domain split in 8 parts, each handled by a single GPU and only the boundaries intercommunicating), multiGPU can actually multiply the performance, plus not everyone can afford a Tesla for the single GPU cases :) The multiGPU code that you mention that's not released, is it because its closed source or due to not enough interest by the community? I suppose I could do the partitioning myself and use standard single GPU MAGMA+OpenMP/MPI and handle the gpu2gpu communication myself.