MAGMA does not currently have this capability, to solve many small matrices entirely on the GPU.
Given that your matrix is SPD, you can use a Cholesky factorization, which is nice because it has simpler control flow than the general LU factorization. No pivoting is required. Depending on the size and number of matrices to be solved simultaneously, either a single thread or a single block could do each factorization.
When you say it is solved multiple times per frame, can those multiple times be in parallel, or does the result of one solve become an input for a subsequent solve?
Also, is the matrix changing for each solve, or you just have different right-hand sides to solve with? If the matrix keeps changing, you have to re-factor each time, as I assumed above. If just the right-hand side changes, then you could factor once (even on the CPU) and just use cublasXtrsm( ) twice to solve entirely on the GPU.