Thanks for reply. Here is the matrices and their sizes, I already have done exactly as in the testing_dgetrf_gpu code.
n = 1024;
double *d_U, *d_Utmp; int n32 = ((n+31)/32)*32;
magma_int_t *ipiv; int info;
TESTING_DEVALLOC( d_Utmp, double, n32 * n);
TESTING_MALLOC(ipiv, magma_int_t, n );
magma_dgetrf_gpu( n, n, d_Utmp, n32, ipiv, &info);
where TESTING_DEVALLOC and TESTING_MALLOC have the (void**) cast and sizeof(...) to allocate on device and on host respectively.
Also I tried using other version of malloc as follow,
magma_imalloc_cpu( &ipiv, n);
magma_dmalloc( &d_Utmp, n32*n);
magma_dgetrf_gpu( n, n, d_T, n32, ipiv, &info);
But I am still getting the same errors.