why not do all the work on GPU?

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)
Post Reply
Posts: 1
Joined: Mon Jan 17, 2011 3:24 am

why not do all the work on GPU?

Post by rych » Mon Jan 17, 2011 4:13 am

I'm looking at magma_dgeqrf_gpu (dgeqrf_gpu.cpp), for example, and see host allocations and data exchange between CPU and GPU. Can't we allocate all the memory, including input, output, and work, and call a sequence of GPU kernels to operate on it? Without calling CPU LAPACK functions?

Stan Tomov
Posts: 283
Joined: Fri Aug 21, 2009 10:39 pm

Re: why not do all the work on GPU?

Post by Stan Tomov » Fri Jan 21, 2011 2:37 am

This is possible and people have tried it but it is in general slower. The computations/tasks that are offloaded to the CPU are small and can not be executed efficiently in parallel on the GPU. These small tasks can be offloaded to the CPU and overlapped with more efficient work (e.g., Level 3 BLAS) on the GPU. What happens is that asymptotically, for large matrices, the execution of the small tasks on the CPU get totally overlapped by work on the GPU, and as a result the overall algorithm runs with the speed that one can execute the Level 3 BLAS on the GPU.

Posts: 2
Joined: Tue Feb 15, 2011 6:09 am

Re: why not do all the work on GPU?

Post by lucky0002 » Thu Feb 17, 2011 1:31 am

Hello , everyone
I am a student, and new to GPGPU thing ,
I have one question,
MAGMA is on CUDA , and I have heard one other name ViennaCL ,
so Which one is better in terms of performance/Speed ..?

Post Reply