MAGMA  2.3.0 Matrix Algebra for GPU and Multicore Architectures
getrf: LU factorization

## Functions

magma_int_t magma_cgetrf_batched (magma_int_t m, magma_int_t n, magmaFloatComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
CGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. More...

magma_int_t magma_cgetrf_recpanel_batched (magma_int_t m, magma_int_t n, magma_int_t min_recpnb, magmaFloatComplex **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, magma_int_t **dpivinfo_array, magmaFloatComplex **dX_array, magma_int_t dX_length, magmaFloatComplex **dinvA_array, magma_int_t dinvA_length, magmaFloatComplex **dW1_displ, magmaFloatComplex **dW2_displ, magmaFloatComplex **dW3_displ, magmaFloatComplex **dW4_displ, magmaFloatComplex **dW5_displ, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue)
This is an internal routine that might have many assumption. More...

magma_int_t magma_dgetrf_batched (magma_int_t m, magma_int_t n, double **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
DGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. More...

magma_int_t magma_dgetrf_recpanel_batched (magma_int_t m, magma_int_t n, magma_int_t min_recpnb, double **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, magma_int_t **dpivinfo_array, double **dX_array, magma_int_t dX_length, double **dinvA_array, magma_int_t dinvA_length, double **dW1_displ, double **dW2_displ, double **dW3_displ, double **dW4_displ, double **dW5_displ, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue)
This is an internal routine that might have many assumption. More...

magma_int_t magma_sgetrf_batched (magma_int_t m, magma_int_t n, float **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
SGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. More...

magma_int_t magma_sgetrf_recpanel_batched (magma_int_t m, magma_int_t n, magma_int_t min_recpnb, float **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, magma_int_t **dpivinfo_array, float **dX_array, magma_int_t dX_length, float **dinvA_array, magma_int_t dinvA_length, float **dW1_displ, float **dW2_displ, float **dW3_displ, float **dW4_displ, float **dW5_displ, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue)
This is an internal routine that might have many assumption. More...

magma_int_t magma_zgetrf_batched (magma_int_t m, magma_int_t n, magmaDoubleComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
ZGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. More...

This is an internal routine that might have many assumption. More...

magma_int_t magma_cgetrf_batched_smallsq_noshfl (magma_int_t n, magmaFloatComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
cgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges. More...

magma_int_t magma_cgetrf_batched_smallsq_shfl (magma_int_t n, magmaFloatComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
cgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges. More...

magma_int_t magma_dgetrf_batched_smallsq_noshfl (magma_int_t n, double **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
dgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges. More...

magma_int_t magma_dgetrf_batched_smallsq_shfl (magma_int_t n, double **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
dgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges. More...

magma_int_t magma_sgetrf_batched_smallsq_noshfl (magma_int_t n, float **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
sgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges. More...

magma_int_t magma_sgetrf_batched_smallsq_shfl (magma_int_t n, float **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
sgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges. More...

magma_int_t magma_zgetrf_batched_smallsq_noshfl (magma_int_t n, magmaDoubleComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
zgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges. More...

magma_int_t magma_zgetrf_batched_smallsq_shfl (magma_int_t n, magmaDoubleComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
zgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges. More...

## Function Documentation

 magma_int_t magma_cgetrf_batched ( magma_int_t m, magma_int_t n, magmaFloatComplex ** dA_array, magma_int_t ldda, magma_int_t ** ipiv_array, magma_int_t * info_array, magma_int_t batchCount, magma_queue_t queue )

CGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
 [in] m INTEGER The number of rows of each matrix A. M >= 0. [in] n INTEGER The number of columns of each matrix A. N >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. [in] ldda INTEGER The leading dimension of each array A. LDDA >= max(1,M). [out] ipiv_array Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_cgetrf_recpanel_batched ( magma_int_t m, magma_int_t n, magma_int_t min_recpnb, magmaFloatComplex ** dA_array, magma_int_t ldda, magma_int_t ** dipiv_array, magma_int_t ** dpivinfo_array, magmaFloatComplex ** dX_array, magma_int_t dX_length, magmaFloatComplex ** dinvA_array, magma_int_t dinvA_length, magmaFloatComplex ** dW1_displ, magmaFloatComplex ** dW2_displ, magmaFloatComplex ** dW3_displ, magmaFloatComplex ** dW4_displ, magmaFloatComplex ** dW5_displ, magma_int_t * info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue )

This is an internal routine that might have many assumption.

Documentation is not fully completed

CGETRF_PANEL computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
 [in] m INTEGER The number of rows of each matrix A. M >= 0. [in] n INTEGER The number of columns of each matrix A. N >= 0. [in] min_recpnb INTEGER. Internal use. The recursive nb [in,out] dA_array Array of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. [in] ldda INTEGER The leading dimension of each array A. LDDA >= max(1,M). [out] dipiv_array Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). [out] dpivinfo_array Array of pointers, dimension (batchCount), for internal use. [in,out] dX_array Array of pointers, dimension (batchCount). Each is a COMPLEX array X of dimension ( lddx, n ). On entry, should be set to 0 On exit, the solution matrix X [in] dX_length INTEGER. The size of each workspace matrix dX [in,out] dinvA_array Array of pointers, dimension (batchCount). Each is a COMPLEX array dinvA, a workspace on device. If side == MagmaLeft, dinvA must be of size >= ceil(m/CTRTRI_BATCHED_NB)*CTRTRI_BATCHED_NB*CTRTRI_BATCHED_NB, If side == MagmaRight, dinvA must be of size >= ceil(n/CTRTRI_BATCHED_NB)*CTRTRI_BATCHED_NB*CTRTRI_BATCHED_NB, [in] dinvA_length INTEGER The size of each workspace matrix dinvA [in] dW1_displ Workspace array of pointers, for internal use. [in] dW2_displ Workspace array of pointers, for internal use. [in] dW3_displ Workspace array of pointers, for internal use. [in] dW4_displ Workspace array of pointers, for internal use. [in] dW5_displ Workspace array of pointers, for internal use. [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. [in] gbstep INTEGER internal use. [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_dgetrf_batched ( magma_int_t m, magma_int_t n, double ** dA_array, magma_int_t ldda, magma_int_t ** ipiv_array, magma_int_t * info_array, magma_int_t batchCount, magma_queue_t queue )

DGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
 [in] m INTEGER The number of rows of each matrix A. M >= 0. [in] n INTEGER The number of columns of each matrix A. N >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. [in] ldda INTEGER The leading dimension of each array A. LDDA >= max(1,M). [out] ipiv_array Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_dgetrf_recpanel_batched ( magma_int_t m, magma_int_t n, magma_int_t min_recpnb, double ** dA_array, magma_int_t ldda, magma_int_t ** dipiv_array, magma_int_t ** dpivinfo_array, double ** dX_array, magma_int_t dX_length, double ** dinvA_array, magma_int_t dinvA_length, double ** dW1_displ, double ** dW2_displ, double ** dW3_displ, double ** dW4_displ, double ** dW5_displ, magma_int_t * info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue )

This is an internal routine that might have many assumption.

Documentation is not fully completed

DGETRF_PANEL computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
 [in] m INTEGER The number of rows of each matrix A. M >= 0. [in] n INTEGER The number of columns of each matrix A. N >= 0. [in] min_recpnb INTEGER. Internal use. The recursive nb [in,out] dA_array Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. [in] ldda INTEGER The leading dimension of each array A. LDDA >= max(1,M). [out] dipiv_array Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). [out] dpivinfo_array Array of pointers, dimension (batchCount), for internal use. [in,out] dX_array Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array X of dimension ( lddx, n ). On entry, should be set to 0 On exit, the solution matrix X [in] dX_length INTEGER. The size of each workspace matrix dX [in,out] dinvA_array Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array dinvA, a workspace on device. If side == MagmaLeft, dinvA must be of size >= ceil(m/DTRTRI_BATCHED_NB)*DTRTRI_BATCHED_NB*DTRTRI_BATCHED_NB, If side == MagmaRight, dinvA must be of size >= ceil(n/DTRTRI_BATCHED_NB)*DTRTRI_BATCHED_NB*DTRTRI_BATCHED_NB, [in] dinvA_length INTEGER The size of each workspace matrix dinvA [in] dW1_displ Workspace array of pointers, for internal use. [in] dW2_displ Workspace array of pointers, for internal use. [in] dW3_displ Workspace array of pointers, for internal use. [in] dW4_displ Workspace array of pointers, for internal use. [in] dW5_displ Workspace array of pointers, for internal use. [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. [in] gbstep INTEGER internal use. [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_sgetrf_batched ( magma_int_t m, magma_int_t n, float ** dA_array, magma_int_t ldda, magma_int_t ** ipiv_array, magma_int_t * info_array, magma_int_t batchCount, magma_queue_t queue )

SGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
 [in] m INTEGER The number of rows of each matrix A. M >= 0. [in] n INTEGER The number of columns of each matrix A. N >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. [in] ldda INTEGER The leading dimension of each array A. LDDA >= max(1,M). [out] ipiv_array Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_sgetrf_recpanel_batched ( magma_int_t m, magma_int_t n, magma_int_t min_recpnb, float ** dA_array, magma_int_t ldda, magma_int_t ** dipiv_array, magma_int_t ** dpivinfo_array, float ** dX_array, magma_int_t dX_length, float ** dinvA_array, magma_int_t dinvA_length, float ** dW1_displ, float ** dW2_displ, float ** dW3_displ, float ** dW4_displ, float ** dW5_displ, magma_int_t * info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue )

This is an internal routine that might have many assumption.

Documentation is not fully completed

SGETRF_PANEL computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
 [in] m INTEGER The number of rows of each matrix A. M >= 0. [in] n INTEGER The number of columns of each matrix A. N >= 0. [in] min_recpnb INTEGER. Internal use. The recursive nb [in,out] dA_array Array of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. [in] ldda INTEGER The leading dimension of each array A. LDDA >= max(1,M). [out] dipiv_array Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). [out] dpivinfo_array Array of pointers, dimension (batchCount), for internal use. [in,out] dX_array Array of pointers, dimension (batchCount). Each is a REAL array X of dimension ( lddx, n ). On entry, should be set to 0 On exit, the solution matrix X [in] dX_length INTEGER. The size of each workspace matrix dX [in,out] dinvA_array Array of pointers, dimension (batchCount). Each is a REAL array dinvA, a workspace on device. If side == MagmaLeft, dinvA must be of size >= ceil(m/STRTRI_BATCHED_NB)*STRTRI_BATCHED_NB*STRTRI_BATCHED_NB, If side == MagmaRight, dinvA must be of size >= ceil(n/STRTRI_BATCHED_NB)*STRTRI_BATCHED_NB*STRTRI_BATCHED_NB, [in] dinvA_length INTEGER The size of each workspace matrix dinvA [in] dW1_displ Workspace array of pointers, for internal use. [in] dW2_displ Workspace array of pointers, for internal use. [in] dW3_displ Workspace array of pointers, for internal use. [in] dW4_displ Workspace array of pointers, for internal use. [in] dW5_displ Workspace array of pointers, for internal use. [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. [in] gbstep INTEGER internal use. [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_zgetrf_batched ( magma_int_t m, magma_int_t n, magmaDoubleComplex ** dA_array, magma_int_t ldda, magma_int_t ** ipiv_array, magma_int_t * info_array, magma_int_t batchCount, magma_queue_t queue )

ZGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
 [in] m INTEGER The number of rows of each matrix A. M >= 0. [in] n INTEGER The number of columns of each matrix A. N >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. [in] ldda INTEGER The leading dimension of each array A. LDDA >= max(1,M). [out] ipiv_array Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_zgetrf_recpanel_batched ( magma_int_t m, magma_int_t n, magma_int_t min_recpnb, magmaDoubleComplex ** dA_array, magma_int_t ldda, magma_int_t ** dipiv_array, magma_int_t ** dpivinfo_array, magmaDoubleComplex ** dX_array, magma_int_t dX_length, magmaDoubleComplex ** dinvA_array, magma_int_t dinvA_length, magmaDoubleComplex ** dW1_displ, magmaDoubleComplex ** dW2_displ, magmaDoubleComplex ** dW3_displ, magmaDoubleComplex ** dW4_displ, magmaDoubleComplex ** dW5_displ, magma_int_t * info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue )

This is an internal routine that might have many assumption.

Documentation is not fully completed

ZGETRF_PANEL computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
 [in] m INTEGER The number of rows of each matrix A. M >= 0. [in] n INTEGER The number of columns of each matrix A. N >= 0. [in] min_recpnb INTEGER. Internal use. The recursive nb [in,out] dA_array Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. [in] ldda INTEGER The leading dimension of each array A. LDDA >= max(1,M). [out] dipiv_array Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). [out] dpivinfo_array Array of pointers, dimension (batchCount), for internal use. [in,out] dX_array Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array X of dimension ( lddx, n ). On entry, should be set to 0 On exit, the solution matrix X [in] dX_length INTEGER. The size of each workspace matrix dX [in,out] dinvA_array Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array dinvA, a workspace on device. If side == MagmaLeft, dinvA must be of size >= ceil(m/ZTRTRI_BATCHED_NB)*ZTRTRI_BATCHED_NB*ZTRTRI_BATCHED_NB, If side == MagmaRight, dinvA must be of size >= ceil(n/ZTRTRI_BATCHED_NB)*ZTRTRI_BATCHED_NB*ZTRTRI_BATCHED_NB, [in] dinvA_length INTEGER The size of each workspace matrix dinvA [in] dW1_displ Workspace array of pointers, for internal use. [in] dW2_displ Workspace array of pointers, for internal use. [in] dW3_displ Workspace array of pointers, for internal use. [in] dW4_displ Workspace array of pointers, for internal use. [in] dW5_displ Workspace array of pointers, for internal use. [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. [in] gbstep INTEGER internal use. [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_cgetrf_batched_smallsq_noshfl ( magma_int_t n, magmaFloatComplex ** dA_array, magma_int_t ldda, magma_int_t ** ipiv_array, magma_int_t * info_array, magma_int_t batchCount, magma_queue_t queue )

cgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges.

This routine can deal only with square matrices of size up to 32

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
 [in] n INTEGER The size of each matrix A. N >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. [in] ldda INTEGER The leading dimension of each array A. LDDA >= max(1,M). [out] ipiv_array Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_cgetrf_batched_smallsq_shfl ( magma_int_t n, magmaFloatComplex ** dA_array, magma_int_t ldda, magma_int_t ** ipiv_array, magma_int_t * info_array, magma_int_t batchCount, magma_queue_t queue )

cgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges.

This routine can deal only with square matrices of size up to 32

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
 [in] n INTEGER The size of each matrix A. N >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. [in] ldda INTEGER The leading dimension of each array A. LDDA >= max(1,M). [out] ipiv_array Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_dgetrf_batched_smallsq_noshfl ( magma_int_t n, double ** dA_array, magma_int_t ldda, magma_int_t ** ipiv_array, magma_int_t * info_array, magma_int_t batchCount, magma_queue_t queue )

dgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges.

This routine can deal only with square matrices of size up to 32

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
 [in] n INTEGER The size of each matrix A. N >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. [in] ldda INTEGER The leading dimension of each array A. LDDA >= max(1,M). [out] ipiv_array Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_dgetrf_batched_smallsq_shfl ( magma_int_t n, double ** dA_array, magma_int_t ldda, magma_int_t ** ipiv_array, magma_int_t * info_array, magma_int_t batchCount, magma_queue_t queue )

dgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges.

This routine can deal only with square matrices of size up to 32

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
 [in] n INTEGER The size of each matrix A. N >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. [in] ldda INTEGER The leading dimension of each array A. LDDA >= max(1,M). [out] ipiv_array Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_sgetrf_batched_smallsq_noshfl ( magma_int_t n, float ** dA_array, magma_int_t ldda, magma_int_t ** ipiv_array, magma_int_t * info_array, magma_int_t batchCount, magma_queue_t queue )

sgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges.

This routine can deal only with square matrices of size up to 32

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
 [in] n INTEGER The size of each matrix A. N >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. [in] ldda INTEGER The leading dimension of each array A. LDDA >= max(1,M). [out] ipiv_array Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_sgetrf_batched_smallsq_shfl ( magma_int_t n, float ** dA_array, magma_int_t ldda, magma_int_t ** ipiv_array, magma_int_t * info_array, magma_int_t batchCount, magma_queue_t queue )

sgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges.

This routine can deal only with square matrices of size up to 32

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
 [in] n INTEGER The size of each matrix A. N >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. [in] ldda INTEGER The leading dimension of each array A. LDDA >= max(1,M). [out] ipiv_array Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_zgetrf_batched_smallsq_noshfl ( magma_int_t n, magmaDoubleComplex ** dA_array, magma_int_t ldda, magma_int_t ** ipiv_array, magma_int_t * info_array, magma_int_t batchCount, magma_queue_t queue )

zgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges.

This routine can deal only with square matrices of size up to 32

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
 [in] n INTEGER The size of each matrix A. N >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. [in] ldda INTEGER The leading dimension of each array A. LDDA >= max(1,M). [out] ipiv_array Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_zgetrf_batched_smallsq_shfl ( magma_int_t n, magmaDoubleComplex ** dA_array, magma_int_t ldda, magma_int_t ** ipiv_array, magma_int_t * info_array, magma_int_t batchCount, magma_queue_t queue )

zgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges.

This routine can deal only with square matrices of size up to 32

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
 [in] n INTEGER The size of each matrix A. N >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. [in] ldda INTEGER The leading dimension of each array A. LDDA >= max(1,M). [out] ipiv_array Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.