MAGMA
2.3.0
Matrix Algebra for GPU and Multicore Architectures

Functions  
magma_int_t  magma_cgetf2_batched (magma_int_t m, magma_int_t n, magmaFloatComplex **dA_array, magma_int_t ldda, magmaFloatComplex **dW0_displ, magmaFloatComplex **dW1_displ, magmaFloatComplex **dW2_displ, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue) 
CGETF2 computes an LU factorization of a general MbyN matrix A using partial pivoting with row interchanges. More...  
magma_int_t  magma_dgetf2_batched (magma_int_t m, magma_int_t n, double **dA_array, magma_int_t ldda, double **dW0_displ, double **dW1_displ, double **dW2_displ, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue) 
DGETF2 computes an LU factorization of a general MbyN matrix A using partial pivoting with row interchanges. More...  
magma_int_t  magma_sgetf2_batched (magma_int_t m, magma_int_t n, float **dA_array, magma_int_t ldda, float **dW0_displ, float **dW1_displ, float **dW2_displ, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue) 
SGETF2 computes an LU factorization of a general MbyN matrix A using partial pivoting with row interchanges. More...  
magma_int_t  magma_zgetf2_batched (magma_int_t m, magma_int_t n, magmaDoubleComplex **dA_array, magma_int_t ldda, magmaDoubleComplex **dW0_displ, magmaDoubleComplex **dW1_displ, magmaDoubleComplex **dW2_displ, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue) 
ZGETF2 computes an LU factorization of a general MbyN matrix A using partial pivoting with row interchanges. More...  
void  magma_cgetf2trsm_batched (magma_int_t ib, magma_int_t n, magmaFloatComplex **dA_array, magma_int_t step, magma_int_t ldda, magma_int_t batchCount, magma_queue_t queue) 
cgetf2trsm solves one of the matrix equations on gpu More...  
magma_int_t  magma_cgetf2_sm_batched (magma_int_t m, magma_int_t ib, magmaFloatComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) 
CGETF2_SM computes an LU factorization of a general MbyN matrix A using partial pivoting with row interchanges. More...  
void  magma_dgetf2trsm_batched (magma_int_t ib, magma_int_t n, double **dA_array, magma_int_t step, magma_int_t ldda, magma_int_t batchCount, magma_queue_t queue) 
dgetf2trsm solves one of the matrix equations on gpu More...  
magma_int_t  magma_dgetf2_sm_batched (magma_int_t m, magma_int_t ib, double **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) 
DGETF2_SM computes an LU factorization of a general MbyN matrix A using partial pivoting with row interchanges. More...  
void  magma_sgetf2trsm_batched (magma_int_t ib, magma_int_t n, float **dA_array, magma_int_t step, magma_int_t ldda, magma_int_t batchCount, magma_queue_t queue) 
sgetf2trsm solves one of the matrix equations on gpu More...  
magma_int_t  magma_sgetf2_sm_batched (magma_int_t m, magma_int_t ib, float **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) 
SGETF2_SM computes an LU factorization of a general MbyN matrix A using partial pivoting with row interchanges. More...  
void  magma_zgetf2trsm_batched (magma_int_t ib, magma_int_t n, magmaDoubleComplex **dA_array, magma_int_t step, magma_int_t ldda, magma_int_t batchCount, magma_queue_t queue) 
zgetf2trsm solves one of the matrix equations on gpu More...  
magma_int_t  magma_zgetf2_sm_batched (magma_int_t m, magma_int_t ib, magmaDoubleComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) 
ZGETF2_SM computes an LU factorization of a general MbyN matrix A using partial pivoting with row interchanges. More...  
magma_int_t magma_cgetf2_batched  (  magma_int_t  m, 
magma_int_t  n,  
magmaFloatComplex **  dA_array,  
magma_int_t  ldda,  
magmaFloatComplex **  dW0_displ,  
magmaFloatComplex **  dW1_displ,  
magmaFloatComplex **  dW2_displ,  
magma_int_t **  ipiv_array,  
magma_int_t *  info_array,  
magma_int_t  gbstep,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
CGETF2 computes an LU factorization of a general MbyN matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the rightlooking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount MbyN matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
[in]  m  INTEGER The number of rows of each matrix A. M >= 0. 
[in]  n  INTEGER The number of columns of each matrix A. N >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA,N). On entry, each pointer is an MbyN matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. 
[in]  ldda  INTEGER The leading dimension of each array A. LDDA >= max(1,M). 
dW0_displ  (workspace) Array of pointers, dimension (batchCount).  
dW1_displ  (workspace) Array of pointers, dimension (batchCount).  
dW2_displ  (workspace) Array of pointers, dimension (batchCount).  
[out]  ipiv_array  Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  gbstep  INTEGER internal use. 
[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
this is an internal routine that might have many assumption.
magma_int_t magma_dgetf2_batched  (  magma_int_t  m, 
magma_int_t  n,  
double **  dA_array,  
magma_int_t  ldda,  
double **  dW0_displ,  
double **  dW1_displ,  
double **  dW2_displ,  
magma_int_t **  ipiv_array,  
magma_int_t *  info_array,  
magma_int_t  gbstep,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
DGETF2 computes an LU factorization of a general MbyN matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the rightlooking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount MbyN matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
[in]  m  INTEGER The number of rows of each matrix A. M >= 0. 
[in]  n  INTEGER The number of columns of each matrix A. N >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA,N). On entry, each pointer is an MbyN matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. 
[in]  ldda  INTEGER The leading dimension of each array A. LDDA >= max(1,M). 
dW0_displ  (workspace) Array of pointers, dimension (batchCount).  
dW1_displ  (workspace) Array of pointers, dimension (batchCount).  
dW2_displ  (workspace) Array of pointers, dimension (batchCount).  
[out]  ipiv_array  Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  gbstep  INTEGER internal use. 
[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
this is an internal routine that might have many assumption.
magma_int_t magma_sgetf2_batched  (  magma_int_t  m, 
magma_int_t  n,  
float **  dA_array,  
magma_int_t  ldda,  
float **  dW0_displ,  
float **  dW1_displ,  
float **  dW2_displ,  
magma_int_t **  ipiv_array,  
magma_int_t *  info_array,  
magma_int_t  gbstep,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
SGETF2 computes an LU factorization of a general MbyN matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the rightlooking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount MbyN matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
[in]  m  INTEGER The number of rows of each matrix A. M >= 0. 
[in]  n  INTEGER The number of columns of each matrix A. N >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA,N). On entry, each pointer is an MbyN matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. 
[in]  ldda  INTEGER The leading dimension of each array A. LDDA >= max(1,M). 
dW0_displ  (workspace) Array of pointers, dimension (batchCount).  
dW1_displ  (workspace) Array of pointers, dimension (batchCount).  
dW2_displ  (workspace) Array of pointers, dimension (batchCount).  
[out]  ipiv_array  Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  gbstep  INTEGER internal use. 
[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
this is an internal routine that might have many assumption.
magma_int_t magma_zgetf2_batched  (  magma_int_t  m, 
magma_int_t  n,  
magmaDoubleComplex **  dA_array,  
magma_int_t  ldda,  
magmaDoubleComplex **  dW0_displ,  
magmaDoubleComplex **  dW1_displ,  
magmaDoubleComplex **  dW2_displ,  
magma_int_t **  ipiv_array,  
magma_int_t *  info_array,  
magma_int_t  gbstep,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
ZGETF2 computes an LU factorization of a general MbyN matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the rightlooking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount MbyN matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
[in]  m  INTEGER The number of rows of each matrix A. M >= 0. 
[in]  n  INTEGER The number of columns of each matrix A. N >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA,N). On entry, each pointer is an MbyN matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. 
[in]  ldda  INTEGER The leading dimension of each array A. LDDA >= max(1,M). 
dW0_displ  (workspace) Array of pointers, dimension (batchCount).  
dW1_displ  (workspace) Array of pointers, dimension (batchCount).  
dW2_displ  (workspace) Array of pointers, dimension (batchCount).  
[out]  ipiv_array  Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  gbstep  INTEGER internal use. 
[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
this is an internal routine that might have many assumption.
void magma_cgetf2trsm_batched  (  magma_int_t  ib, 
magma_int_t  n,  
magmaFloatComplex **  dA_array,  
magma_int_t  step,  
magma_int_t  ldda,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
cgetf2trsm solves one of the matrix equations on gpu
B = C^1 * B
where C, B are part of the matrix A in dA_array,
This version load C, B into shared memory and solve it and copy back to GPU device memory. This is an internal routine that might have many assumption.
[in]  ib  INTEGER The number of rows/columns of each matrix C, and rows of B. ib >= 0. 
[in]  n  INTEGER The number of columns of each matrix B. n >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA,N). On entry, each pointer is an MbyN matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. 
[in]  ldda  INTEGER The leading dimension of each array A. LDDA >= max(1,M). 
[in]  step  INTEGER The starting address of matrix C in A. LDDA >= max(1,M). 
[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
magma_int_t magma_cgetf2_sm_batched  (  magma_int_t  m, 
magma_int_t  ib,  
magmaFloatComplex **  dA_array,  
magma_int_t  ldda,  
magma_int_t **  ipiv_array,  
magma_int_t *  info_array,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
CGETF2_SM computes an LU factorization of a general MbyN matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the rightlooking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount MbyN matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
This version load entire matrix (m*ib) into shared memory and factorize it with pivoting and copy back to GPU device memory.
[in]  m  INTEGER The number of rows of each matrix A. M >= 0. 
[in]  ib  INTEGER The number of columns of each matrix A. ib >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA,N). On entry, each pointer is an MbyN matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. 
[in]  ldda  INTEGER The leading dimension of each array A. LDDA >= max(1,M). 
[out]  ipiv_array  Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
void magma_dgetf2trsm_batched  (  magma_int_t  ib, 
magma_int_t  n,  
double **  dA_array,  
magma_int_t  step,  
magma_int_t  ldda,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
dgetf2trsm solves one of the matrix equations on gpu
B = C^1 * B
where C, B are part of the matrix A in dA_array,
This version load C, B into shared memory and solve it and copy back to GPU device memory. This is an internal routine that might have many assumption.
[in]  ib  INTEGER The number of rows/columns of each matrix C, and rows of B. ib >= 0. 
[in]  n  INTEGER The number of columns of each matrix B. n >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA,N). On entry, each pointer is an MbyN matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. 
[in]  ldda  INTEGER The leading dimension of each array A. LDDA >= max(1,M). 
[in]  step  INTEGER The starting address of matrix C in A. LDDA >= max(1,M). 
[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
magma_int_t magma_dgetf2_sm_batched  (  magma_int_t  m, 
magma_int_t  ib,  
double **  dA_array,  
magma_int_t  ldda,  
magma_int_t **  ipiv_array,  
magma_int_t *  info_array,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
DGETF2_SM computes an LU factorization of a general MbyN matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the rightlooking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount MbyN matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
This version load entire matrix (m*ib) into shared memory and factorize it with pivoting and copy back to GPU device memory.
[in]  m  INTEGER The number of rows of each matrix A. M >= 0. 
[in]  ib  INTEGER The number of columns of each matrix A. ib >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA,N). On entry, each pointer is an MbyN matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. 
[in]  ldda  INTEGER The leading dimension of each array A. LDDA >= max(1,M). 
[out]  ipiv_array  Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
void magma_sgetf2trsm_batched  (  magma_int_t  ib, 
magma_int_t  n,  
float **  dA_array,  
magma_int_t  step,  
magma_int_t  ldda,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
sgetf2trsm solves one of the matrix equations on gpu
B = C^1 * B
where C, B are part of the matrix A in dA_array,
This version load C, B into shared memory and solve it and copy back to GPU device memory. This is an internal routine that might have many assumption.
[in]  ib  INTEGER The number of rows/columns of each matrix C, and rows of B. ib >= 0. 
[in]  n  INTEGER The number of columns of each matrix B. n >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA,N). On entry, each pointer is an MbyN matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. 
[in]  ldda  INTEGER The leading dimension of each array A. LDDA >= max(1,M). 
[in]  step  INTEGER The starting address of matrix C in A. LDDA >= max(1,M). 
[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
magma_int_t magma_sgetf2_sm_batched  (  magma_int_t  m, 
magma_int_t  ib,  
float **  dA_array,  
magma_int_t  ldda,  
magma_int_t **  ipiv_array,  
magma_int_t *  info_array,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
SGETF2_SM computes an LU factorization of a general MbyN matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the rightlooking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount MbyN matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
This version load entire matrix (m*ib) into shared memory and factorize it with pivoting and copy back to GPU device memory.
[in]  m  INTEGER The number of rows of each matrix A. M >= 0. 
[in]  ib  INTEGER The number of columns of each matrix A. ib >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA,N). On entry, each pointer is an MbyN matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. 
[in]  ldda  INTEGER The leading dimension of each array A. LDDA >= max(1,M). 
[out]  ipiv_array  Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
void magma_zgetf2trsm_batched  (  magma_int_t  ib, 
magma_int_t  n,  
magmaDoubleComplex **  dA_array,  
magma_int_t  step,  
magma_int_t  ldda,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
zgetf2trsm solves one of the matrix equations on gpu
B = C^1 * B
where C, B are part of the matrix A in dA_array,
This version load C, B into shared memory and solve it and copy back to GPU device memory. This is an internal routine that might have many assumption.
[in]  ib  INTEGER The number of rows/columns of each matrix C, and rows of B. ib >= 0. 
[in]  n  INTEGER The number of columns of each matrix B. n >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA,N). On entry, each pointer is an MbyN matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. 
[in]  ldda  INTEGER The leading dimension of each array A. LDDA >= max(1,M). 
[in]  step  INTEGER The starting address of matrix C in A. LDDA >= max(1,M). 
[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
magma_int_t magma_zgetf2_sm_batched  (  magma_int_t  m, 
magma_int_t  ib,  
magmaDoubleComplex **  dA_array,  
magma_int_t  ldda,  
magma_int_t **  ipiv_array,  
magma_int_t *  info_array,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
ZGETF2_SM computes an LU factorization of a general MbyN matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the rightlooking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount MbyN matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
This version load entire matrix (m*ib) into shared memory and factorize it with pivoting and copy back to GPU device memory.
[in]  m  INTEGER The number of rows of each matrix A. M >= 0. 
[in]  ib  INTEGER The number of columns of each matrix A. ib >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA,N). On entry, each pointer is an MbyN matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. 
[in]  ldda  INTEGER The leading dimension of each array A. LDDA >= max(1,M). 
[out]  ipiv_array  Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 