?getf2_gpu question (1.4.0-beta2)

Open discussion for MAGMA

?getf2_gpu question (1.4.0-beta2)

Postby jah87 » Mon Jul 08, 2013 11:18 am

I'm very pleased to see the addition of the GPU-only LU factorization. This will be very helpful for my application, and many others I'm sure. However, I am worried that the implementation is still not entirely asynchronous with respect to the launching CP thread. Although all of the kernels are indeed being launched as independent kernels, there appears to still be a number of CPU dependencies, particularly in the case of MAGMA being used with multiple CPU cores sharing a single GPU:
    1) Legacy CUBLAS library prevents concurrent execution between multiple CPU threads.
    2) Conditional statements following the call to cublasIdaMax create an implicit synchronization between the host and device.
    3) Changing the device cache configuration (cudaDeviceSetCacheConfig) forces a global device synchronization, and consequently CPU thread synchronization.

With these points in mind, is there any work being done to implement context switching/multiple streams in these routines and/or MAGMA as a whole? Many of the techniques are already available in the CUDA sample cdpLUDecompoistion routine, which uses CUDA Dynamic Parallelism to perform a right-looking level 3 BLAS version of LU decomposition with partial pivoting entirely on the device. Something akin to this example with the ability to perform a single kernel launch would be very beneficial to myself, and I'd wager many others.
jah87
 
Posts: 21
Joined: Tue May 01, 2012 1:54 pm

Re: ?getf2_gpu question (1.4.0-beta2)

Postby jah87 » Tue Dec 03, 2013 8:54 pm

I haven't seen any updates on this in a while.

What is the current status of task-level parallelism factorizations/solves on the GPU only (not hybrid); i.e. batching? I know there were some people working on this, just curious if any progress has been made.
jah87
 
Posts: 21
Joined: Tue May 01, 2012 1:54 pm

Re: ?getf2_gpu question (1.4.0-beta2)

Postby mgates3 » Wed Dec 04, 2013 8:36 pm

The panels (getf2, potf2, geqr2) are in the current release. Being memory bound, limited parallelism, and more control flow (e.g., pivot search), they do not currently achieve very high performance. It's faster to use the CPU for the panel and allow the GPU to simultaneously do high performance GEMMs. Even for batching small matrices, the CPU is quite fast because small matrices will fit in the L2 or L3 cache. But we're still exploring ideas.
-mark
mgates3
 
Posts: 427
Joined: Fri Jan 06, 2012 2:13 pm

Re: ?getf2_gpu question (1.4.0-beta2)

Postby jah87 » Thu Dec 05, 2013 5:28 pm

mgates3 wrote:The panels (getf2, potf2, geqr2) are in the current release. Being memory bound, limited parallelism, and more control flow (e.g., pivot search), they do not currently achieve very high performance. It's faster to use the CPU for the panel and allow the GPU to simultaneously do high performance GEMMs. Even for batching small matrices, the CPU is quite fast because small matrices will fit in the L2 or L3 cache. But we're still exploring ideas.
-mark

Hi-

Absolutely, the CPU is faster than the GPU for the panel LU for small matrices. In my case, I have anywhere from 200-400 matrices of size ~160 on each core of a 8-16 core processor--all sharing 1 GPU. There are plenty of flops to be done to make using the GPU advantageous. A hybrid approach could work, but with the current routines, in dgetrf_gpu for example, there is a lot of implicit synchronization with the CPU inherent in the fact that MAGMA currently solves one system at a time which has thus far prevented me from using the GPU effectively. I've made some progress on this with streams, but I was only able to get so far, and it was not optimized (see this post: http://icl.cs.utk.edu/magma/forum/viewtopic.php?f=2&t=897).

I have high hopes for one of the new routines in Cuda 5.5, getrfbatched I think, and I was just curious if such methods were also being looked at in MAGMA.
jah87
 
Posts: 21
Joined: Tue May 01, 2012 1:54 pm


Return to User discussion

Who is online

Users browsing this forum: No registered users and 2 guests