Hi,
I understand MAGMA algorithms are hybrid CPU + GPU. I have a realtime application in computer vision which needs to solve an Ax=b (A is 6x6 symmetric positive definite in double precision) linear system multiple times per frame at 30fps. The application is running mostly on the GPU at the moment but the solution of this linear system is performed on the CPU with Eigen, involving hundreds of GPU > CPU > GPU memory copies every second.
I would like to see what the performance of the application would be like if I could solve the linear system exclusively on the GPU (and completely eliminate GPU  CPU memory copies) even if the solution to the individual linear systems exclusively on the GPU are much slower than on the CPU it might be made up for by eliminating the CPUGPU memory copies and kernel launches etc.
If MAGMA can't help here, do you have any ideas of another implementation that might or ideas how to go about custom coding it using lower level libraries (CUBLAS maybe?) for the 6x6 case? As you can probably tell I'm not an expert on Linear Algebra so if I haven't been clear enough please let me know.
Best Regards,
JP
Solve small Ax=b on GPU only with no CPU involvement?
Re: Solve small Ax=b on GPU only with no CPU involvement?
MAGMA does not currently have this capability, to solve many small matrices entirely on the GPU.
Given that your matrix is SPD, you can use a Cholesky factorization, which is nice because it has simpler control flow than the general LU factorization. No pivoting is required. Depending on the size and number of matrices to be solved simultaneously, either a single thread or a single block could do each factorization.
When you say it is solved multiple times per frame, can those multiple times be in parallel, or does the result of one solve become an input for a subsequent solve?
Also, is the matrix changing for each solve, or you just have different righthand sides to solve with? If the matrix keeps changing, you have to refactor each time, as I assumed above. If just the righthand side changes, then you could factor once (even on the CPU) and just use cublasXtrsm( ) twice to solve entirely on the GPU.
mark
Given that your matrix is SPD, you can use a Cholesky factorization, which is nice because it has simpler control flow than the general LU factorization. No pivoting is required. Depending on the size and number of matrices to be solved simultaneously, either a single thread or a single block could do each factorization.
When you say it is solved multiple times per frame, can those multiple times be in parallel, or does the result of one solve become an input for a subsequent solve?
Also, is the matrix changing for each solve, or you just have different righthand sides to solve with? If the matrix keeps changing, you have to refactor each time, as I assumed above. If just the righthand side changes, then you could factor once (even on the CPU) and just use cublasXtrsm( ) twice to solve entirely on the GPU.
mark
Re: Solve small Ax=b on GPU only with no CPU involvement?
Hi Mark,
Thank you for the reply. One clarification  you said:
"MAGMA does not currently have this capability, to solve many small matrices entirely on the GPU."
Does it have the capability to solve a single small matrix entirely on the GPU? I'm not sure if I was clear enough in my original post, my application flow requires the matrices to be solved individually one at a time. I don't need to solve many at a time on the GPU, just one at a time where the input data resides on the GPU and the result is provided on the GPU without any memory copy to, or involvement of, the CPU.
Best Regards,
JP.
Thank you for the reply. One clarification  you said:
"MAGMA does not currently have this capability, to solve many small matrices entirely on the GPU."
Does it have the capability to solve a single small matrix entirely on the GPU? I'm not sure if I was clear enough in my original post, my application flow requires the matrices to be solved individually one at a time. I don't need to solve many at a time on the GPU, just one at a time where the input data resides on the GPU and the result is provided on the GPU without any memory copy to, or involvement of, the CPU.
Best Regards,
JP.
Re: Solve small Ax=b on GPU only with no CPU involvement?
No, it can't currently factor a matrix entirely on the GPU. The current hybrid code always use both the CPU and the GPU to factor the matrix. For matrices of this small size (n=6), it would actually end up factoring it entirely on the CPU, even if the GPU interface was used.
mark
mark

 Posts: 1
 Joined: Fri Mar 13, 2015 5:00 am
 Contact:
Re: Solve small Ax=b on GPU only with no CPU involvement?
The system contain at least an x86 CPU and a CUDA capable GPU, that's right?