Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)
I am using magma_dgeqrf2_mgpu function to do QR factorization on multiple gpus. I did not get good enhancement in perofrmance when used 2 gpus rather than 1 gpu for matrices less than 5000*5000. I want to know why this happens and if there is a better function to do QR factorization on multiple gpus for small dense matrices.
- Posts: 7
- Joined: Mon May 12, 2014 12:38 pm
This is not surprising. There are overheads in a multi-GPU implementation, such as copying pieces of the matrix twice to two GPUs. Only when the matrix is sufficiently large does the computational savings overcome the added overheads. It also depends a lot on the specific CPU, GPU, and implementation of LAPACK and BLAS that you are using. You can try tuning the block size (NB) in control/get_nb.cpp.
- Posts: 587
- Joined: Fri Jan 06, 2012 2:13 pm
Return to User discussion
Who is online
Users browsing this forum: Google [Bot], Yahoo [Bot] and 1 guest