Hi,
I understand MAGMA algorithms are hybrid CPU + GPU. I have a realtime application in computer vision which needs to solve an Ax=b (A is 6x6 symmetric positive definite in double precision) linear system multiple times per frame at 30fps. The application is running mostly on the GPU at the moment but the solution of this linear system is performed on the CPU with Eigen, involving hundreds of GPU -> CPU -> GPU memory copies every second.
I would like to see what the performance of the application would be like if I could solve the linear system exclusively on the GPU (and completely eliminate GPU - CPU memory copies) even if the solution to the individual linear systems exclusively on the GPU are much slower than on the CPU it might be made up for by eliminating the CPU-GPU memory copies and kernel launches etc.
If MAGMA can't help here, do you have any ideas of another implementation that might or ideas how to go about custom coding it using lower level libraries (CUBLAS maybe?) for the 6x6 case? As you can probably tell I'm not an expert on Linear Algebra so if I haven't been clear enough please let me know.
Best Regards,
JP
