I am currently working on a problem where I need to compute the LU factorization, on the GPU using MAGMA, of a large matrix (20000 x 20000 entries, or more), which is too big for the memory that we have on our GPU. Is there any support in MAGMA for doing the LU factorization of matrices of this size? For instance, by blocking the LU factorization into chunks that fit the GPU memory and overlapping computation and data transfer, or similar.
Afterwards I will also use the factorization to solve, so I am interested in the functions:
magma_dgetrf_gpu, and magma_dgetrs_gpu