I'm testing a CUDA framework that I'm developing and I'd like to use MAGMA to implement a blocked Cholesky factorization (so, I need the dpotrf, dtrsm, dgemm and dsyrk kernels). For my experiments, I need these functions to be asynchronous and I need to be able to configure the CUDA stream where they are launched (as I will later synchronize using CUDA events). Ideally, the kernels should run exclusively on the GPU.
However, I read in a previous post that dpotrf function is synchronous because it's partially run on the CPU and that setting the CUDA stream in MAGMA is not thread safe. Is this true for the latest MAGMA release?
I know I can use CUBLAS for dtrsm, dgemm and dsyrk (run asynchronously in the CUDA stream that I set), but I also need dpotrf... Is it possible to use MAGMA kernels in the way that I need it?