MAGMA  2.3.0 Matrix Algebra for GPU and Multicore Architectures
getf2: LU panel factorization

## Functions

magma_int_t magma_cgetf2_batched (magma_int_t m, magma_int_t n, magmaFloatComplex **dA_array, magma_int_t ldda, magmaFloatComplex **dW0_displ, magmaFloatComplex **dW1_displ, magmaFloatComplex **dW2_displ, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue)
CGETF2 computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. More...

magma_int_t magma_dgetf2_batched (magma_int_t m, magma_int_t n, double **dA_array, magma_int_t ldda, double **dW0_displ, double **dW1_displ, double **dW2_displ, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue)
DGETF2 computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. More...

magma_int_t magma_sgetf2_batched (magma_int_t m, magma_int_t n, float **dA_array, magma_int_t ldda, float **dW0_displ, float **dW1_displ, float **dW2_displ, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue)
SGETF2 computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. More...

magma_int_t magma_zgetf2_batched (magma_int_t m, magma_int_t n, magmaDoubleComplex **dA_array, magma_int_t ldda, magmaDoubleComplex **dW0_displ, magmaDoubleComplex **dW1_displ, magmaDoubleComplex **dW2_displ, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue)
ZGETF2 computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. More...

void magma_cgetf2trsm_batched (magma_int_t ib, magma_int_t n, magmaFloatComplex **dA_array, magma_int_t step, magma_int_t ldda, magma_int_t batchCount, magma_queue_t queue)
cgetf2trsm solves one of the matrix equations on gpu More...

magma_int_t magma_cgetf2_sm_batched (magma_int_t m, magma_int_t ib, magmaFloatComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
CGETF2_SM computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. More...

void magma_dgetf2trsm_batched (magma_int_t ib, magma_int_t n, double **dA_array, magma_int_t step, magma_int_t ldda, magma_int_t batchCount, magma_queue_t queue)
dgetf2trsm solves one of the matrix equations on gpu More...

magma_int_t magma_dgetf2_sm_batched (magma_int_t m, magma_int_t ib, double **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
DGETF2_SM computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. More...

void magma_sgetf2trsm_batched (magma_int_t ib, magma_int_t n, float **dA_array, magma_int_t step, magma_int_t ldda, magma_int_t batchCount, magma_queue_t queue)
sgetf2trsm solves one of the matrix equations on gpu More...

magma_int_t magma_sgetf2_sm_batched (magma_int_t m, magma_int_t ib, float **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
SGETF2_SM computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. More...

void magma_zgetf2trsm_batched (magma_int_t ib, magma_int_t n, magmaDoubleComplex **dA_array, magma_int_t step, magma_int_t ldda, magma_int_t batchCount, magma_queue_t queue)
zgetf2trsm solves one of the matrix equations on gpu More...

magma_int_t magma_zgetf2_sm_batched (magma_int_t m, magma_int_t ib, magmaDoubleComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
ZGETF2_SM computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. More...

## Function Documentation

 magma_int_t magma_cgetf2_batched ( magma_int_t m, magma_int_t n, magmaFloatComplex ** dA_array, magma_int_t ldda, magmaFloatComplex ** dW0_displ, magmaFloatComplex ** dW1_displ, magmaFloatComplex ** dW2_displ, magma_int_t ** ipiv_array, magma_int_t * info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue )

CGETF2 computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
 [in] m INTEGER The number of rows of each matrix A. M >= 0. [in] n INTEGER The number of columns of each matrix A. N >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. [in] ldda INTEGER The leading dimension of each array A. LDDA >= max(1,M). dW0_displ (workspace) Array of pointers, dimension (batchCount). dW1_displ (workspace) Array of pointers, dimension (batchCount). dW2_displ (workspace) Array of pointers, dimension (batchCount). [out] ipiv_array Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. [in] gbstep INTEGER internal use. [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.

this is an internal routine that might have many assumption.

 magma_int_t magma_dgetf2_batched ( magma_int_t m, magma_int_t n, double ** dA_array, magma_int_t ldda, double ** dW0_displ, double ** dW1_displ, double ** dW2_displ, magma_int_t ** ipiv_array, magma_int_t * info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue )

DGETF2 computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
 [in] m INTEGER The number of rows of each matrix A. M >= 0. [in] n INTEGER The number of columns of each matrix A. N >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. [in] ldda INTEGER The leading dimension of each array A. LDDA >= max(1,M). dW0_displ (workspace) Array of pointers, dimension (batchCount). dW1_displ (workspace) Array of pointers, dimension (batchCount). dW2_displ (workspace) Array of pointers, dimension (batchCount). [out] ipiv_array Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. [in] gbstep INTEGER internal use. [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.

this is an internal routine that might have many assumption.

 magma_int_t magma_sgetf2_batched ( magma_int_t m, magma_int_t n, float ** dA_array, magma_int_t ldda, float ** dW0_displ, float ** dW1_displ, float ** dW2_displ, magma_int_t ** ipiv_array, magma_int_t * info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue )

SGETF2 computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
 [in] m INTEGER The number of rows of each matrix A. M >= 0. [in] n INTEGER The number of columns of each matrix A. N >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. [in] ldda INTEGER The leading dimension of each array A. LDDA >= max(1,M). dW0_displ (workspace) Array of pointers, dimension (batchCount). dW1_displ (workspace) Array of pointers, dimension (batchCount). dW2_displ (workspace) Array of pointers, dimension (batchCount). [out] ipiv_array Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. [in] gbstep INTEGER internal use. [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.

this is an internal routine that might have many assumption.

 magma_int_t magma_zgetf2_batched ( magma_int_t m, magma_int_t n, magmaDoubleComplex ** dA_array, magma_int_t ldda, magmaDoubleComplex ** dW0_displ, magmaDoubleComplex ** dW1_displ, magmaDoubleComplex ** dW2_displ, magma_int_t ** ipiv_array, magma_int_t * info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue )

ZGETF2 computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
 [in] m INTEGER The number of rows of each matrix A. M >= 0. [in] n INTEGER The number of columns of each matrix A. N >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. [in] ldda INTEGER The leading dimension of each array A. LDDA >= max(1,M). dW0_displ (workspace) Array of pointers, dimension (batchCount). dW1_displ (workspace) Array of pointers, dimension (batchCount). dW2_displ (workspace) Array of pointers, dimension (batchCount). [out] ipiv_array Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. [in] gbstep INTEGER internal use. [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.

this is an internal routine that might have many assumption.

 void magma_cgetf2trsm_batched ( magma_int_t ib, magma_int_t n, magmaFloatComplex ** dA_array, magma_int_t step, magma_int_t ldda, magma_int_t batchCount, magma_queue_t queue )

cgetf2trsm solves one of the matrix equations on gpu

B = C^-1 * B

where C, B are part of the matrix A in dA_array,

This version load C, B into shared memory and solve it and copy back to GPU device memory. This is an internal routine that might have many assumption.

Parameters
 [in] ib INTEGER The number of rows/columns of each matrix C, and rows of B. ib >= 0. [in] n INTEGER The number of columns of each matrix B. n >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. [in] ldda INTEGER The leading dimension of each array A. LDDA >= max(1,M). [in] step INTEGER The starting address of matrix C in A. LDDA >= max(1,M). [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_cgetf2_sm_batched ( magma_int_t m, magma_int_t ib, magmaFloatComplex ** dA_array, magma_int_t ldda, magma_int_t ** ipiv_array, magma_int_t * info_array, magma_int_t batchCount, magma_queue_t queue )

CGETF2_SM computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

This version load entire matrix (m*ib) into shared memory and factorize it with pivoting and copy back to GPU device memory.

Parameters
 [in] m INTEGER The number of rows of each matrix A. M >= 0. [in] ib INTEGER The number of columns of each matrix A. ib >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. [in] ldda INTEGER The leading dimension of each array A. LDDA >= max(1,M). [out] ipiv_array Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.
 void magma_dgetf2trsm_batched ( magma_int_t ib, magma_int_t n, double ** dA_array, magma_int_t step, magma_int_t ldda, magma_int_t batchCount, magma_queue_t queue )

dgetf2trsm solves one of the matrix equations on gpu

B = C^-1 * B

where C, B are part of the matrix A in dA_array,

This version load C, B into shared memory and solve it and copy back to GPU device memory. This is an internal routine that might have many assumption.

Parameters
 [in] ib INTEGER The number of rows/columns of each matrix C, and rows of B. ib >= 0. [in] n INTEGER The number of columns of each matrix B. n >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. [in] ldda INTEGER The leading dimension of each array A. LDDA >= max(1,M). [in] step INTEGER The starting address of matrix C in A. LDDA >= max(1,M). [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_dgetf2_sm_batched ( magma_int_t m, magma_int_t ib, double ** dA_array, magma_int_t ldda, magma_int_t ** ipiv_array, magma_int_t * info_array, magma_int_t batchCount, magma_queue_t queue )

DGETF2_SM computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

This version load entire matrix (m*ib) into shared memory and factorize it with pivoting and copy back to GPU device memory.

Parameters
 [in] m INTEGER The number of rows of each matrix A. M >= 0. [in] ib INTEGER The number of columns of each matrix A. ib >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. [in] ldda INTEGER The leading dimension of each array A. LDDA >= max(1,M). [out] ipiv_array Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.
 void magma_sgetf2trsm_batched ( magma_int_t ib, magma_int_t n, float ** dA_array, magma_int_t step, magma_int_t ldda, magma_int_t batchCount, magma_queue_t queue )

sgetf2trsm solves one of the matrix equations on gpu

B = C^-1 * B

where C, B are part of the matrix A in dA_array,

This version load C, B into shared memory and solve it and copy back to GPU device memory. This is an internal routine that might have many assumption.

Parameters
 [in] ib INTEGER The number of rows/columns of each matrix C, and rows of B. ib >= 0. [in] n INTEGER The number of columns of each matrix B. n >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. [in] ldda INTEGER The leading dimension of each array A. LDDA >= max(1,M). [in] step INTEGER The starting address of matrix C in A. LDDA >= max(1,M). [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_sgetf2_sm_batched ( magma_int_t m, magma_int_t ib, float ** dA_array, magma_int_t ldda, magma_int_t ** ipiv_array, magma_int_t * info_array, magma_int_t batchCount, magma_queue_t queue )

SGETF2_SM computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

This version load entire matrix (m*ib) into shared memory and factorize it with pivoting and copy back to GPU device memory.

Parameters
 [in] m INTEGER The number of rows of each matrix A. M >= 0. [in] ib INTEGER The number of columns of each matrix A. ib >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. [in] ldda INTEGER The leading dimension of each array A. LDDA >= max(1,M). [out] ipiv_array Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.
 void magma_zgetf2trsm_batched ( magma_int_t ib, magma_int_t n, magmaDoubleComplex ** dA_array, magma_int_t step, magma_int_t ldda, magma_int_t batchCount, magma_queue_t queue )

zgetf2trsm solves one of the matrix equations on gpu

B = C^-1 * B

where C, B are part of the matrix A in dA_array,

This version load C, B into shared memory and solve it and copy back to GPU device memory. This is an internal routine that might have many assumption.

Parameters
 [in] ib INTEGER The number of rows/columns of each matrix C, and rows of B. ib >= 0. [in] n INTEGER The number of columns of each matrix B. n >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. [in] ldda INTEGER The leading dimension of each array A. LDDA >= max(1,M). [in] step INTEGER The starting address of matrix C in A. LDDA >= max(1,M). [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_zgetf2_sm_batched ( magma_int_t m, magma_int_t ib, magmaDoubleComplex ** dA_array, magma_int_t ldda, magma_int_t ** ipiv_array, magma_int_t * info_array, magma_int_t batchCount, magma_queue_t queue )

ZGETF2_SM computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

This version load entire matrix (m*ib) into shared memory and factorize it with pivoting and copy back to GPU device memory.

Parameters
 [in] m INTEGER The number of rows of each matrix A. M >= 0. [in] ib INTEGER The number of columns of each matrix A. ib >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. [in] ldda INTEGER The leading dimension of each array A. LDDA >= max(1,M). [out] ipiv_array Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.