parallel execution of multiple magma_dsyevd

Open discussion for MAGMA

parallel execution of multiple magma_dsyevd

Postby haskell » Mon Jul 04, 2011 2:08 am

The domain I am working in is possibly unusual in that it requires eigendecomposition of hundreds of small matrices (e.g 5x5). All of these matrices are ready to go at the same time having been developed by a sequence of CUDA 4.0 kernels.

Each of these decompositions is far too small to keep the device busy. In a similar situation with CUBLAS, one would open a stream per matrix operation and cublasSetStream for the operation, and since 16 streams can be executing at the same time the device is better utilized (hopefully, in a future hardware release 64 CUDA streams will be able to execute at the same time).

What does one do in this situation with Magma?
I imagine that the hybrid nature of Magma diminishes or eliminates the utility of cublasSetStream.
Also, say the CPU only supports 4 parallel hardware threads.
haskell
 
Posts: 9
Joined: Sun Jul 03, 2011 11:58 pm

Re: parallel execution of multiple magma_dsyevd

Postby Stan Tomov » Thu Jul 07, 2011 12:13 pm

MAGMA so far has not targeted this type of problems, but we are looking into it, e.g., in connection to spectral element agglomeration AMG. I would be interested to know what is your application. Thanks.

I can think of several ways to organize the computation for these problems. One way is, as you mentioned, to use streams. This may require though the matrices to be somehow larger so that the computation can be done efficiently on a multiprocessor. Another way may be to have a single thread deal with a 5x5 matrix. In this case one has to think of what data structures to use, for example, to interleave 32 (or more) 5x5 matrices to insure coalescent reads, etc.
Stan Tomov
 
Posts: 251
Joined: Fri Aug 21, 2009 10:39 pm

Re: parallel execution of multiple magma_dsyevd

Postby haskell » Fri Jul 08, 2011 1:46 am

The application is a parallel implementation of a subset of Natural Bond Orbital analysis relevant to my project. NBO http://www.chem.wisc.edu/~nbo5/ is immensely useful in quantum chemistry and is the most rational and accurate of the quantum analyses. NBO is presently coded in single-threaded Fortran. Quantum chemistry is about to see a huge speed increase in the form of TeraChem, but NBO (which runs after the electronic structure system, TeraChem in this case) is not close to keeping up. TeraChem http://www.petachem.com/ is delivering 1000 times the uniprocessor performance of the mature quantum package GAMESS for larger molecules.
haskell
 
Posts: 9
Joined: Sun Jul 03, 2011 11:58 pm


Return to User discussion

Who is online

Users browsing this forum: Bing [Bot] and 1 guest