1) You can make magma_int_t into 64-bit. You need to link with an ilp64 BLAS and LAPACK library. See make.inc.mkl-ilp64.
2) MAGMA will probably not help with such a tall skinny matrix. MAGMA does a panel factorization on the CPU, followed by updating the trailing matrix on the GPU. The panel size depends on the matrix size, but is always >= 32. Since your entire matrix is less columns than that, it will do the entire factorization on the CPU and no work on the GPU. You could change the nb to something small like 8, but I think it would get poor performance. See control/get_nb.cpp.
You could transpose the matrix and then do QR, resulting in LQ^T of the original matrix. That should be fast with MAGMA. (Sadly, doing LQ of the transposed matrix won't help; MAGMA's LQ does a transpose and QR.)
3) PLASMA might be a better option for a tall skinny matrix, using multi-core CPUs. It has a hierarchical QR function to achieve parallelism.http://icl.cs.utk.edu/plasma/