Yes, most MAGMA functions including dpotrf are hybrid -- they do some work on the CPU, namely the panel, dpotf2 -- so they won't be asynchronous. There is a Cholesky panel in MAGMA, magma_dpotf2_gpu, which runs completely on the GPU. You could use that to build a dpotrf that runs completely on the GPU. Panel operations tend to be slow on the GPU, though, so it may cause the entire factorization to be slower. You can set MAGMA's stream beforehand, but if you have other threads that are also setting MAGMA's stream, currently you would need to modify dpotf2 to pass in a stream to be thread-safe. Eventually, the MAGMA API will change to have a stream passed into each function.