MAGMA
2.3.0
Matrix Algebra for GPU and Multicore Architectures

Functions  
magma_int_t  magma_cgetrf_batched (magma_int_t m, magma_int_t n, magmaFloatComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) 
CGETRF computes an LU factorization of a general MbyN matrix A using partial pivoting with row interchanges. More...  
magma_int_t  magma_cgetrf_recpanel_batched (magma_int_t m, magma_int_t n, magma_int_t min_recpnb, magmaFloatComplex **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, magma_int_t **dpivinfo_array, magmaFloatComplex **dX_array, magma_int_t dX_length, magmaFloatComplex **dinvA_array, magma_int_t dinvA_length, magmaFloatComplex **dW1_displ, magmaFloatComplex **dW2_displ, magmaFloatComplex **dW3_displ, magmaFloatComplex **dW4_displ, magmaFloatComplex **dW5_displ, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue) 
This is an internal routine that might have many assumption. More...  
magma_int_t  magma_dgetrf_batched (magma_int_t m, magma_int_t n, double **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) 
DGETRF computes an LU factorization of a general MbyN matrix A using partial pivoting with row interchanges. More...  
magma_int_t  magma_dgetrf_recpanel_batched (magma_int_t m, magma_int_t n, magma_int_t min_recpnb, double **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, magma_int_t **dpivinfo_array, double **dX_array, magma_int_t dX_length, double **dinvA_array, magma_int_t dinvA_length, double **dW1_displ, double **dW2_displ, double **dW3_displ, double **dW4_displ, double **dW5_displ, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue) 
This is an internal routine that might have many assumption. More...  
magma_int_t  magma_sgetrf_batched (magma_int_t m, magma_int_t n, float **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) 
SGETRF computes an LU factorization of a general MbyN matrix A using partial pivoting with row interchanges. More...  
magma_int_t  magma_sgetrf_recpanel_batched (magma_int_t m, magma_int_t n, magma_int_t min_recpnb, float **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, magma_int_t **dpivinfo_array, float **dX_array, magma_int_t dX_length, float **dinvA_array, magma_int_t dinvA_length, float **dW1_displ, float **dW2_displ, float **dW3_displ, float **dW4_displ, float **dW5_displ, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue) 
This is an internal routine that might have many assumption. More...  
magma_int_t  magma_zgetrf_batched (magma_int_t m, magma_int_t n, magmaDoubleComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) 
ZGETRF computes an LU factorization of a general MbyN matrix A using partial pivoting with row interchanges. More...  
magma_int_t  magma_zgetrf_recpanel_batched (magma_int_t m, magma_int_t n, magma_int_t min_recpnb, magmaDoubleComplex **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, magma_int_t **dpivinfo_array, magmaDoubleComplex **dX_array, magma_int_t dX_length, magmaDoubleComplex **dinvA_array, magma_int_t dinvA_length, magmaDoubleComplex **dW1_displ, magmaDoubleComplex **dW2_displ, magmaDoubleComplex **dW3_displ, magmaDoubleComplex **dW4_displ, magmaDoubleComplex **dW5_displ, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue) 
This is an internal routine that might have many assumption. More...  
magma_int_t  magma_cgetrf_batched_smallsq_noshfl (magma_int_t n, magmaFloatComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) 
cgetrf_batched_smallsq_noshfl computes the LU factorization of a square NbyN matrix A using partial pivoting with row interchanges. More...  
magma_int_t  magma_cgetrf_batched_smallsq_shfl (magma_int_t n, magmaFloatComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) 
cgetrf_batched_smallsq_noshfl computes the LU factorization of a square NbyN matrix A using partial pivoting with row interchanges. More...  
magma_int_t  magma_dgetrf_batched_smallsq_noshfl (magma_int_t n, double **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) 
dgetrf_batched_smallsq_noshfl computes the LU factorization of a square NbyN matrix A using partial pivoting with row interchanges. More...  
magma_int_t  magma_dgetrf_batched_smallsq_shfl (magma_int_t n, double **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) 
dgetrf_batched_smallsq_noshfl computes the LU factorization of a square NbyN matrix A using partial pivoting with row interchanges. More...  
magma_int_t  magma_sgetrf_batched_smallsq_noshfl (magma_int_t n, float **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) 
sgetrf_batched_smallsq_noshfl computes the LU factorization of a square NbyN matrix A using partial pivoting with row interchanges. More...  
magma_int_t  magma_sgetrf_batched_smallsq_shfl (magma_int_t n, float **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) 
sgetrf_batched_smallsq_noshfl computes the LU factorization of a square NbyN matrix A using partial pivoting with row interchanges. More...  
magma_int_t  magma_zgetrf_batched_smallsq_noshfl (magma_int_t n, magmaDoubleComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) 
zgetrf_batched_smallsq_noshfl computes the LU factorization of a square NbyN matrix A using partial pivoting with row interchanges. More...  
magma_int_t  magma_zgetrf_batched_smallsq_shfl (magma_int_t n, magmaDoubleComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) 
zgetrf_batched_smallsq_noshfl computes the LU factorization of a square NbyN matrix A using partial pivoting with row interchanges. More...  
magma_int_t magma_cgetrf_batched  (  magma_int_t  m, 
magma_int_t  n,  
magmaFloatComplex **  dA_array,  
magma_int_t  ldda,  
magma_int_t **  ipiv_array,  
magma_int_t *  info_array,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
CGETRF computes an LU factorization of a general MbyN matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the rightlooking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount MbyN matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
[in]  m  INTEGER The number of rows of each matrix A. M >= 0. 
[in]  n  INTEGER The number of columns of each matrix A. N >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA,N). On entry, each pointer is an MbyN matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. 
[in]  ldda  INTEGER The leading dimension of each array A. LDDA >= max(1,M). 
[out]  ipiv_array  Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
magma_int_t magma_cgetrf_recpanel_batched  (  magma_int_t  m, 
magma_int_t  n,  
magma_int_t  min_recpnb,  
magmaFloatComplex **  dA_array,  
magma_int_t  ldda,  
magma_int_t **  dipiv_array,  
magma_int_t **  dpivinfo_array,  
magmaFloatComplex **  dX_array,  
magma_int_t  dX_length,  
magmaFloatComplex **  dinvA_array,  
magma_int_t  dinvA_length,  
magmaFloatComplex **  dW1_displ,  
magmaFloatComplex **  dW2_displ,  
magmaFloatComplex **  dW3_displ,  
magmaFloatComplex **  dW4_displ,  
magmaFloatComplex **  dW5_displ,  
magma_int_t *  info_array,  
magma_int_t  gbstep,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
This is an internal routine that might have many assumption.
Documentation is not fully completed
CGETRF_PANEL computes an LU factorization of a general MbyN matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the rightlooking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount MbyN matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
[in]  m  INTEGER The number of rows of each matrix A. M >= 0. 
[in]  n  INTEGER The number of columns of each matrix A. N >= 0. 
[in]  min_recpnb  INTEGER. Internal use. The recursive nb 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA,N). On entry, each pointer is an MbyN matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. 
[in]  ldda  INTEGER The leading dimension of each array A. LDDA >= max(1,M). 
[out]  dipiv_array  Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). 
[out]  dpivinfo_array  Array of pointers, dimension (batchCount), for internal use. 
[in,out]  dX_array  Array of pointers, dimension (batchCount). Each is a COMPLEX array X of dimension ( lddx, n ). On entry, should be set to 0 On exit, the solution matrix X 
[in]  dX_length  INTEGER. The size of each workspace matrix dX 
[in,out]  dinvA_array  Array of pointers, dimension (batchCount). Each is a COMPLEX array dinvA, a workspace on device. If side == MagmaLeft, dinvA must be of size >= ceil(m/CTRTRI_BATCHED_NB)*CTRTRI_BATCHED_NB*CTRTRI_BATCHED_NB, If side == MagmaRight, dinvA must be of size >= ceil(n/CTRTRI_BATCHED_NB)*CTRTRI_BATCHED_NB*CTRTRI_BATCHED_NB, 
[in]  dinvA_length  INTEGER The size of each workspace matrix dinvA 
[in]  dW1_displ  Workspace array of pointers, for internal use. 
[in]  dW2_displ  Workspace array of pointers, for internal use. 
[in]  dW3_displ  Workspace array of pointers, for internal use. 
[in]  dW4_displ  Workspace array of pointers, for internal use. 
[in]  dW5_displ  Workspace array of pointers, for internal use. 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  gbstep  INTEGER internal use. 
[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
magma_int_t magma_dgetrf_batched  (  magma_int_t  m, 
magma_int_t  n,  
double **  dA_array,  
magma_int_t  ldda,  
magma_int_t **  ipiv_array,  
magma_int_t *  info_array,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
DGETRF computes an LU factorization of a general MbyN matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the rightlooking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount MbyN matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
[in]  m  INTEGER The number of rows of each matrix A. M >= 0. 
[in]  n  INTEGER The number of columns of each matrix A. N >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA,N). On entry, each pointer is an MbyN matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. 
[in]  ldda  INTEGER The leading dimension of each array A. LDDA >= max(1,M). 
[out]  ipiv_array  Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
magma_int_t magma_dgetrf_recpanel_batched  (  magma_int_t  m, 
magma_int_t  n,  
magma_int_t  min_recpnb,  
double **  dA_array,  
magma_int_t  ldda,  
magma_int_t **  dipiv_array,  
magma_int_t **  dpivinfo_array,  
double **  dX_array,  
magma_int_t  dX_length,  
double **  dinvA_array,  
magma_int_t  dinvA_length,  
double **  dW1_displ,  
double **  dW2_displ,  
double **  dW3_displ,  
double **  dW4_displ,  
double **  dW5_displ,  
magma_int_t *  info_array,  
magma_int_t  gbstep,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
This is an internal routine that might have many assumption.
Documentation is not fully completed
DGETRF_PANEL computes an LU factorization of a general MbyN matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the rightlooking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount MbyN matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
[in]  m  INTEGER The number of rows of each matrix A. M >= 0. 
[in]  n  INTEGER The number of columns of each matrix A. N >= 0. 
[in]  min_recpnb  INTEGER. Internal use. The recursive nb 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA,N). On entry, each pointer is an MbyN matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. 
[in]  ldda  INTEGER The leading dimension of each array A. LDDA >= max(1,M). 
[out]  dipiv_array  Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). 
[out]  dpivinfo_array  Array of pointers, dimension (batchCount), for internal use. 
[in,out]  dX_array  Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array X of dimension ( lddx, n ). On entry, should be set to 0 On exit, the solution matrix X 
[in]  dX_length  INTEGER. The size of each workspace matrix dX 
[in,out]  dinvA_array  Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array dinvA, a workspace on device. If side == MagmaLeft, dinvA must be of size >= ceil(m/DTRTRI_BATCHED_NB)*DTRTRI_BATCHED_NB*DTRTRI_BATCHED_NB, If side == MagmaRight, dinvA must be of size >= ceil(n/DTRTRI_BATCHED_NB)*DTRTRI_BATCHED_NB*DTRTRI_BATCHED_NB, 
[in]  dinvA_length  INTEGER The size of each workspace matrix dinvA 
[in]  dW1_displ  Workspace array of pointers, for internal use. 
[in]  dW2_displ  Workspace array of pointers, for internal use. 
[in]  dW3_displ  Workspace array of pointers, for internal use. 
[in]  dW4_displ  Workspace array of pointers, for internal use. 
[in]  dW5_displ  Workspace array of pointers, for internal use. 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  gbstep  INTEGER internal use. 
[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
magma_int_t magma_sgetrf_batched  (  magma_int_t  m, 
magma_int_t  n,  
float **  dA_array,  
magma_int_t  ldda,  
magma_int_t **  ipiv_array,  
magma_int_t *  info_array,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
SGETRF computes an LU factorization of a general MbyN matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the rightlooking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount MbyN matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
[in]  m  INTEGER The number of rows of each matrix A. M >= 0. 
[in]  n  INTEGER The number of columns of each matrix A. N >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA,N). On entry, each pointer is an MbyN matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. 
[in]  ldda  INTEGER The leading dimension of each array A. LDDA >= max(1,M). 
[out]  ipiv_array  Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
magma_int_t magma_sgetrf_recpanel_batched  (  magma_int_t  m, 
magma_int_t  n,  
magma_int_t  min_recpnb,  
float **  dA_array,  
magma_int_t  ldda,  
magma_int_t **  dipiv_array,  
magma_int_t **  dpivinfo_array,  
float **  dX_array,  
magma_int_t  dX_length,  
float **  dinvA_array,  
magma_int_t  dinvA_length,  
float **  dW1_displ,  
float **  dW2_displ,  
float **  dW3_displ,  
float **  dW4_displ,  
float **  dW5_displ,  
magma_int_t *  info_array,  
magma_int_t  gbstep,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
This is an internal routine that might have many assumption.
Documentation is not fully completed
SGETRF_PANEL computes an LU factorization of a general MbyN matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the rightlooking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount MbyN matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
[in]  m  INTEGER The number of rows of each matrix A. M >= 0. 
[in]  n  INTEGER The number of columns of each matrix A. N >= 0. 
[in]  min_recpnb  INTEGER. Internal use. The recursive nb 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA,N). On entry, each pointer is an MbyN matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. 
[in]  ldda  INTEGER The leading dimension of each array A. LDDA >= max(1,M). 
[out]  dipiv_array  Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). 
[out]  dpivinfo_array  Array of pointers, dimension (batchCount), for internal use. 
[in,out]  dX_array  Array of pointers, dimension (batchCount). Each is a REAL array X of dimension ( lddx, n ). On entry, should be set to 0 On exit, the solution matrix X 
[in]  dX_length  INTEGER. The size of each workspace matrix dX 
[in,out]  dinvA_array  Array of pointers, dimension (batchCount). Each is a REAL array dinvA, a workspace on device. If side == MagmaLeft, dinvA must be of size >= ceil(m/STRTRI_BATCHED_NB)*STRTRI_BATCHED_NB*STRTRI_BATCHED_NB, If side == MagmaRight, dinvA must be of size >= ceil(n/STRTRI_BATCHED_NB)*STRTRI_BATCHED_NB*STRTRI_BATCHED_NB, 
[in]  dinvA_length  INTEGER The size of each workspace matrix dinvA 
[in]  dW1_displ  Workspace array of pointers, for internal use. 
[in]  dW2_displ  Workspace array of pointers, for internal use. 
[in]  dW3_displ  Workspace array of pointers, for internal use. 
[in]  dW4_displ  Workspace array of pointers, for internal use. 
[in]  dW5_displ  Workspace array of pointers, for internal use. 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  gbstep  INTEGER internal use. 
[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
magma_int_t magma_zgetrf_batched  (  magma_int_t  m, 
magma_int_t  n,  
magmaDoubleComplex **  dA_array,  
magma_int_t  ldda,  
magma_int_t **  ipiv_array,  
magma_int_t *  info_array,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
ZGETRF computes an LU factorization of a general MbyN matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the rightlooking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount MbyN matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
[in]  m  INTEGER The number of rows of each matrix A. M >= 0. 
[in]  n  INTEGER The number of columns of each matrix A. N >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA,N). On entry, each pointer is an MbyN matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. 
[in]  ldda  INTEGER The leading dimension of each array A. LDDA >= max(1,M). 
[out]  ipiv_array  Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
magma_int_t magma_zgetrf_recpanel_batched  (  magma_int_t  m, 
magma_int_t  n,  
magma_int_t  min_recpnb,  
magmaDoubleComplex **  dA_array,  
magma_int_t  ldda,  
magma_int_t **  dipiv_array,  
magma_int_t **  dpivinfo_array,  
magmaDoubleComplex **  dX_array,  
magma_int_t  dX_length,  
magmaDoubleComplex **  dinvA_array,  
magma_int_t  dinvA_length,  
magmaDoubleComplex **  dW1_displ,  
magmaDoubleComplex **  dW2_displ,  
magmaDoubleComplex **  dW3_displ,  
magmaDoubleComplex **  dW4_displ,  
magmaDoubleComplex **  dW5_displ,  
magma_int_t *  info_array,  
magma_int_t  gbstep,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
This is an internal routine that might have many assumption.
Documentation is not fully completed
ZGETRF_PANEL computes an LU factorization of a general MbyN matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the rightlooking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount MbyN matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
[in]  m  INTEGER The number of rows of each matrix A. M >= 0. 
[in]  n  INTEGER The number of columns of each matrix A. N >= 0. 
[in]  min_recpnb  INTEGER. Internal use. The recursive nb 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA,N). On entry, each pointer is an MbyN matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. 
[in]  ldda  INTEGER The leading dimension of each array A. LDDA >= max(1,M). 
[out]  dipiv_array  Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). 
[out]  dpivinfo_array  Array of pointers, dimension (batchCount), for internal use. 
[in,out]  dX_array  Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array X of dimension ( lddx, n ). On entry, should be set to 0 On exit, the solution matrix X 
[in]  dX_length  INTEGER. The size of each workspace matrix dX 
[in,out]  dinvA_array  Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array dinvA, a workspace on device. If side == MagmaLeft, dinvA must be of size >= ceil(m/ZTRTRI_BATCHED_NB)*ZTRTRI_BATCHED_NB*ZTRTRI_BATCHED_NB, If side == MagmaRight, dinvA must be of size >= ceil(n/ZTRTRI_BATCHED_NB)*ZTRTRI_BATCHED_NB*ZTRTRI_BATCHED_NB, 
[in]  dinvA_length  INTEGER The size of each workspace matrix dinvA 
[in]  dW1_displ  Workspace array of pointers, for internal use. 
[in]  dW2_displ  Workspace array of pointers, for internal use. 
[in]  dW3_displ  Workspace array of pointers, for internal use. 
[in]  dW4_displ  Workspace array of pointers, for internal use. 
[in]  dW5_displ  Workspace array of pointers, for internal use. 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  gbstep  INTEGER internal use. 
[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
magma_int_t magma_cgetrf_batched_smallsq_noshfl  (  magma_int_t  n, 
magmaFloatComplex **  dA_array,  
magma_int_t  ldda,  
magma_int_t **  ipiv_array,  
magma_int_t *  info_array,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
cgetrf_batched_smallsq_noshfl computes the LU factorization of a square NbyN matrix A using partial pivoting with row interchanges.
This routine can deal only with square matrices of size up to 32
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the rightlooking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount MbyN matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
[in]  n  INTEGER The size of each matrix A. N >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA,N). On entry, each pointer is an MbyN matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. 
[in]  ldda  INTEGER The leading dimension of each array A. LDDA >= max(1,M). 
[out]  ipiv_array  Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
magma_int_t magma_cgetrf_batched_smallsq_shfl  (  magma_int_t  n, 
magmaFloatComplex **  dA_array,  
magma_int_t  ldda,  
magma_int_t **  ipiv_array,  
magma_int_t *  info_array,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
cgetrf_batched_smallsq_noshfl computes the LU factorization of a square NbyN matrix A using partial pivoting with row interchanges.
This routine can deal only with square matrices of size up to 32
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the rightlooking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount MbyN matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
[in]  n  INTEGER The size of each matrix A. N >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA,N). On entry, each pointer is an MbyN matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. 
[in]  ldda  INTEGER The leading dimension of each array A. LDDA >= max(1,M). 
[out]  ipiv_array  Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
magma_int_t magma_dgetrf_batched_smallsq_noshfl  (  magma_int_t  n, 
double **  dA_array,  
magma_int_t  ldda,  
magma_int_t **  ipiv_array,  
magma_int_t *  info_array,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
dgetrf_batched_smallsq_noshfl computes the LU factorization of a square NbyN matrix A using partial pivoting with row interchanges.
This routine can deal only with square matrices of size up to 32
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the rightlooking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount MbyN matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
[in]  n  INTEGER The size of each matrix A. N >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA,N). On entry, each pointer is an MbyN matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. 
[in]  ldda  INTEGER The leading dimension of each array A. LDDA >= max(1,M). 
[out]  ipiv_array  Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
magma_int_t magma_dgetrf_batched_smallsq_shfl  (  magma_int_t  n, 
double **  dA_array,  
magma_int_t  ldda,  
magma_int_t **  ipiv_array,  
magma_int_t *  info_array,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
dgetrf_batched_smallsq_noshfl computes the LU factorization of a square NbyN matrix A using partial pivoting with row interchanges.
This routine can deal only with square matrices of size up to 32
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the rightlooking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount MbyN matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
[in]  n  INTEGER The size of each matrix A. N >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA,N). On entry, each pointer is an MbyN matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. 
[in]  ldda  INTEGER The leading dimension of each array A. LDDA >= max(1,M). 
[out]  ipiv_array  Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
magma_int_t magma_sgetrf_batched_smallsq_noshfl  (  magma_int_t  n, 
float **  dA_array,  
magma_int_t  ldda,  
magma_int_t **  ipiv_array,  
magma_int_t *  info_array,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
sgetrf_batched_smallsq_noshfl computes the LU factorization of a square NbyN matrix A using partial pivoting with row interchanges.
This routine can deal only with square matrices of size up to 32
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the rightlooking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount MbyN matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
[in]  n  INTEGER The size of each matrix A. N >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA,N). On entry, each pointer is an MbyN matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. 
[in]  ldda  INTEGER The leading dimension of each array A. LDDA >= max(1,M). 
[out]  ipiv_array  Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
magma_int_t magma_sgetrf_batched_smallsq_shfl  (  magma_int_t  n, 
float **  dA_array,  
magma_int_t  ldda,  
magma_int_t **  ipiv_array,  
magma_int_t *  info_array,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
sgetrf_batched_smallsq_noshfl computes the LU factorization of a square NbyN matrix A using partial pivoting with row interchanges.
This routine can deal only with square matrices of size up to 32
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the rightlooking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount MbyN matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
[in]  n  INTEGER The size of each matrix A. N >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA,N). On entry, each pointer is an MbyN matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. 
[in]  ldda  INTEGER The leading dimension of each array A. LDDA >= max(1,M). 
[out]  ipiv_array  Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
magma_int_t magma_zgetrf_batched_smallsq_noshfl  (  magma_int_t  n, 
magmaDoubleComplex **  dA_array,  
magma_int_t  ldda,  
magma_int_t **  ipiv_array,  
magma_int_t *  info_array,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
zgetrf_batched_smallsq_noshfl computes the LU factorization of a square NbyN matrix A using partial pivoting with row interchanges.
This routine can deal only with square matrices of size up to 32
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the rightlooking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount MbyN matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
[in]  n  INTEGER The size of each matrix A. N >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA,N). On entry, each pointer is an MbyN matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. 
[in]  ldda  INTEGER The leading dimension of each array A. LDDA >= max(1,M). 
[out]  ipiv_array  Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
magma_int_t magma_zgetrf_batched_smallsq_shfl  (  magma_int_t  n, 
magmaDoubleComplex **  dA_array,  
magma_int_t  ldda,  
magma_int_t **  ipiv_array,  
magma_int_t *  info_array,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
zgetrf_batched_smallsq_noshfl computes the LU factorization of a square NbyN matrix A using partial pivoting with row interchanges.
This routine can deal only with square matrices of size up to 32
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the rightlooking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount MbyN matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
[in]  n  INTEGER The size of each matrix A. N >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA,N). On entry, each pointer is an MbyN matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. 
[in]  ldda  INTEGER The leading dimension of each array A. LDDA >= max(1,M). 
[out]  ipiv_array  Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 