why not do all the work on GPU?

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)

why not do all the work on GPU?

Postby rych » Mon Jan 17, 2011 4:13 am

I'm looking at magma_dgeqrf_gpu (dgeqrf_gpu.cpp), for example, and see host allocations and data exchange between CPU and GPU. Can't we allocate all the memory, including input, output, and work, and call a sequence of GPU kernels to operate on it? Without calling CPU LAPACK functions?
Posts: 1
Joined: Mon Jan 17, 2011 3:24 am

Re: why not do all the work on GPU?

Postby Stan Tomov » Fri Jan 21, 2011 2:37 am

This is possible and people have tried it but it is in general slower. The computations/tasks that are offloaded to the CPU are small and can not be executed efficiently in parallel on the GPU. These small tasks can be offloaded to the CPU and overlapped with more efficient work (e.g., Level 3 BLAS) on the GPU. What happens is that asymptotically, for large matrices, the execution of the small tasks on the CPU get totally overlapped by work on the GPU, and as a result the overall algorithm runs with the speed that one can execute the Level 3 BLAS on the GPU.
Stan Tomov
Posts: 258
Joined: Fri Aug 21, 2009 10:39 pm

Re: why not do all the work on GPU?

Postby lucky0002 » Thu Feb 17, 2011 1:31 am

Hello , everyone
I am a student, and new to GPGPU thing ,
I have one question,
MAGMA is on CUDA , and I have heard one other name ViennaCL ,
so Which one is better in terms of performance/Speed ..?
Posts: 2
Joined: Tue Feb 15, 2011 6:09 am

Return to User discussion

Who is online

Users browsing this forum: No registered users and 2 guests