MAGMA
2.3.0
Matrix Algebra for GPU and Multicore Architectures

Functions  
magma_int_t  magma_cgeqrf_batched (magma_int_t m, magma_int_t n, magmaFloatComplex **dA_array, magma_int_t ldda, magmaFloatComplex **dtau_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) 
CGEQRF computes a QR factorization of a complex MbyN matrix A: A = Q * R. More...  
magma_int_t  magma_cgeqrf_expert_batched (magma_int_t m, magma_int_t n, magmaFloatComplex **dA_array, magma_int_t ldda, magmaFloatComplex **dR_array, magma_int_t lddr, magmaFloatComplex **dT_array, magma_int_t lddt, magmaFloatComplex **dtau_array, magma_int_t provide_RT, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) 
CGEQRF computes a QR factorization of a complex MbyN matrix A: A = Q * R. More...  
magma_int_t  magma_dgeqrf_batched (magma_int_t m, magma_int_t n, double **dA_array, magma_int_t ldda, double **dtau_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) 
DGEQRF computes a QR factorization of a real MbyN matrix A: A = Q * R. More...  
magma_int_t  magma_dgeqrf_expert_batched (magma_int_t m, magma_int_t n, double **dA_array, magma_int_t ldda, double **dR_array, magma_int_t lddr, double **dT_array, magma_int_t lddt, double **dtau_array, magma_int_t provide_RT, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) 
DGEQRF computes a QR factorization of a real MbyN matrix A: A = Q * R. More...  
magma_int_t  magma_sgeqrf_batched (magma_int_t m, magma_int_t n, float **dA_array, magma_int_t ldda, float **dtau_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) 
SGEQRF computes a QR factorization of a real MbyN matrix A: A = Q * R. More...  
magma_int_t  magma_sgeqrf_expert_batched (magma_int_t m, magma_int_t n, float **dA_array, magma_int_t ldda, float **dR_array, magma_int_t lddr, float **dT_array, magma_int_t lddt, float **dtau_array, magma_int_t provide_RT, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) 
SGEQRF computes a QR factorization of a real MbyN matrix A: A = Q * R. More...  
magma_int_t  magma_zgeqrf_batched (magma_int_t m, magma_int_t n, magmaDoubleComplex **dA_array, magma_int_t ldda, magmaDoubleComplex **dtau_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) 
ZGEQRF computes a QR factorization of a complex MbyN matrix A: A = Q * R. More...  
magma_int_t  magma_zgeqrf_expert_batched (magma_int_t m, magma_int_t n, magmaDoubleComplex **dA_array, magma_int_t ldda, magmaDoubleComplex **dR_array, magma_int_t lddr, magmaDoubleComplex **dT_array, magma_int_t lddt, magmaDoubleComplex **dtau_array, magma_int_t provide_RT, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) 
ZGEQRF computes a QR factorization of a complex MbyN matrix A: A = Q * R. More...  
magma_int_t  magma_cgeqrf_batched_smallsq (magma_int_t n, magmaFloatComplex **dA_array, magma_int_t ldda, magmaFloatComplex **dtau_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) 
CGEQRF computes a QR factorization of a complex MbyN matrix A: A = Q * R. More...  
magma_int_t  magma_dgeqrf_batched_smallsq (magma_int_t n, double **dA_array, magma_int_t ldda, double **dtau_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) 
DGEQRF computes a QR factorization of a real MbyN matrix A: A = Q * R. More...  
magma_int_t  magma_sgeqrf_batched_smallsq (magma_int_t n, float **dA_array, magma_int_t ldda, float **dtau_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) 
SGEQRF computes a QR factorization of a real MbyN matrix A: A = Q * R. More...  
magma_int_t  magma_zgeqrf_batched_smallsq (magma_int_t n, magmaDoubleComplex **dA_array, magma_int_t ldda, magmaDoubleComplex **dtau_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) 
ZGEQRF computes a QR factorization of a complex MbyN matrix A: A = Q * R. More...  
magma_int_t magma_cgeqrf_batched  (  magma_int_t  m, 
magma_int_t  n,  
magmaFloatComplex **  dA_array,  
magma_int_t  ldda,  
magmaFloatComplex **  dtau_array,  
magma_int_t *  info_array,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
CGEQRF computes a QR factorization of a complex MbyN matrix A: A = Q * R.
[in]  m  INTEGER The number of rows of the matrix A. M >= 0. 
[in]  n  INTEGER The number of columns of the matrix A. N >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA,N) On entry, the MbyN matrix A. On exit, the elements on and above the diagonal of the array contain the min(M,N)byN upper trapezoidal matrix R (R is upper triangular if m >= n); the elements below the diagonal, with the array TAU, represent the orthogonal matrix Q as a product of min(m,n) elementary reflectors (see Further Details). 
[in]  ldda  INTEGER The leading dimension of the array dA. LDDA >= max(1,M). To benefit from coalescent memory accesses LDDA must be divisible by 16. 
[out]  dtau_array  Array of pointers, dimension (batchCount). Each is a COMPLEX array, dimension (min(M,N)) The scalar factors of the elementary reflectors (see Further Details). 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
The matrix Q is represented as a product of elementary reflectors
Q = H(1) H(2) . . . H(k), where k = min(m,n).
Each H(i) has the form
H(i) = I  tau * v * v'
where tau is a complex scalar, and v is a complex vector with v(1:i1) = 0 and v(i) = 1; v(i+1:m) is stored on exit in A(i+1:m,i), and tau in TAU(i).
magma_int_t magma_cgeqrf_expert_batched  (  magma_int_t  m, 
magma_int_t  n,  
magmaFloatComplex **  dA_array,  
magma_int_t  ldda,  
magmaFloatComplex **  dR_array,  
magma_int_t  lddr,  
magmaFloatComplex **  dT_array,  
magma_int_t  lddt,  
magmaFloatComplex **  dtau_array,  
magma_int_t  provide_RT,  
magma_int_t *  info_array,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
CGEQRF computes a QR factorization of a complex MbyN matrix A: A = Q * R.
[in]  m  INTEGER The number of rows of the matrix A. M >= 0. 
[in]  n  INTEGER The number of columns of the matrix A. N >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA,N) On entry, the MbyN matrix A. On exit, the elements on and above the diagonal of the array contain the min(M,N)byN upper trapezoidal matrix R (R is upper triangular if m >= n); the elements below the diagonal, with the array TAU, represent the orthogonal matrix Q as a product of min(m,n) elementary reflectors (see Further Details). 
[in]  ldda  INTEGER The leading dimension of the array dA. LDDA >= max(1,M). To benefit from coalescent memory accesses LDDA must be divisible by 16. 
[in,out]  dR_array  Array of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDR, N/NB) dR should be of size (LDDR, N) when provide_RT > 0 and of size (LDDT, NB) otherwise. NB is the local blocking size. On exit, the elements of R are stored in dR only when provide_RT > 0. 
[in]  lddr  INTEGER The leading dimension of the array dR. LDDR >= min(M,N) when provide_RT == 1 otherwise LDDR >= min(NB, min(M,N)). NB is the local blocking size. To benefit from coalescent memory accesses LDDR must be divisible by 16. 
[in,out]  dT_array  Array of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDT, N/NB) dT should be of size (LDDT, N) when provide_RT > 0 and of size (LDDT, NB) otherwise. NB is the local blocking size. On exit, the elements of T are stored in dT only when provide_RT > 0. 
[in]  lddt  INTEGER The leading dimension of the array dT. LDDT >= min(NB,min(M,N)). NB is the local blocking size. To benefit from coalescent memory accesses LDDR must be divisible by 16. 
[out]  dtau_array  Array of pointers, dimension (batchCount). Each is a COMPLEX array, dimension (min(M,N)) The scalar factors of the elementary reflectors (see Further Details). 
[in]  provide_RT  INTEGER provide_RT = 0 no R and no T in output. dR and dT are used as local workspace to store the R and T of each step. provide_RT = 1 the whole R of size (min(M,N), N) and the nbxnb block of T are provided in output. provide_RT = 2 the nbxnb diag block of R and of T are provided in output. 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
The matrix Q is represented as a product of elementary reflectors
Q = H(1) H(2) . . . H(k), where k = min(m,n).
Each H(i) has the form
H(i) = I  tau * v * v'
where tau is a complex scalar, and v is a complex vector with v(1:i1) = 0 and v(i) = 1; v(i+1:m) is stored on exit in A(i+1:m,i), and tau in TAU(i).
magma_int_t magma_dgeqrf_batched  (  magma_int_t  m, 
magma_int_t  n,  
double **  dA_array,  
magma_int_t  ldda,  
double **  dtau_array,  
magma_int_t *  info_array,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
DGEQRF computes a QR factorization of a real MbyN matrix A: A = Q * R.
[in]  m  INTEGER The number of rows of the matrix A. M >= 0. 
[in]  n  INTEGER The number of columns of the matrix A. N >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA,N) On entry, the MbyN matrix A. On exit, the elements on and above the diagonal of the array contain the min(M,N)byN upper trapezoidal matrix R (R is upper triangular if m >= n); the elements below the diagonal, with the array TAU, represent the orthogonal matrix Q as a product of min(m,n) elementary reflectors (see Further Details). 
[in]  ldda  INTEGER The leading dimension of the array dA. LDDA >= max(1,M). To benefit from coalescent memory accesses LDDA must be divisible by 16. 
[out]  dtau_array  Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array, dimension (min(M,N)) The scalar factors of the elementary reflectors (see Further Details). 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
The matrix Q is represented as a product of elementary reflectors
Q = H(1) H(2) . . . H(k), where k = min(m,n).
Each H(i) has the form
H(i) = I  tau * v * v'
where tau is a real scalar, and v is a real vector with v(1:i1) = 0 and v(i) = 1; v(i+1:m) is stored on exit in A(i+1:m,i), and tau in TAU(i).
magma_int_t magma_dgeqrf_expert_batched  (  magma_int_t  m, 
magma_int_t  n,  
double **  dA_array,  
magma_int_t  ldda,  
double **  dR_array,  
magma_int_t  lddr,  
double **  dT_array,  
magma_int_t  lddt,  
double **  dtau_array,  
magma_int_t  provide_RT,  
magma_int_t *  info_array,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
DGEQRF computes a QR factorization of a real MbyN matrix A: A = Q * R.
[in]  m  INTEGER The number of rows of the matrix A. M >= 0. 
[in]  n  INTEGER The number of columns of the matrix A. N >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA,N) On entry, the MbyN matrix A. On exit, the elements on and above the diagonal of the array contain the min(M,N)byN upper trapezoidal matrix R (R is upper triangular if m >= n); the elements below the diagonal, with the array TAU, represent the orthogonal matrix Q as a product of min(m,n) elementary reflectors (see Further Details). 
[in]  ldda  INTEGER The leading dimension of the array dA. LDDA >= max(1,M). To benefit from coalescent memory accesses LDDA must be divisible by 16. 
[in,out]  dR_array  Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDR, N/NB) dR should be of size (LDDR, N) when provide_RT > 0 and of size (LDDT, NB) otherwise. NB is the local blocking size. On exit, the elements of R are stored in dR only when provide_RT > 0. 
[in]  lddr  INTEGER The leading dimension of the array dR. LDDR >= min(M,N) when provide_RT == 1 otherwise LDDR >= min(NB, min(M,N)). NB is the local blocking size. To benefit from coalescent memory accesses LDDR must be divisible by 16. 
[in,out]  dT_array  Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDT, N/NB) dT should be of size (LDDT, N) when provide_RT > 0 and of size (LDDT, NB) otherwise. NB is the local blocking size. On exit, the elements of T are stored in dT only when provide_RT > 0. 
[in]  lddt  INTEGER The leading dimension of the array dT. LDDT >= min(NB,min(M,N)). NB is the local blocking size. To benefit from coalescent memory accesses LDDR must be divisible by 16. 
[out]  dtau_array  Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array, dimension (min(M,N)) The scalar factors of the elementary reflectors (see Further Details). 
[in]  provide_RT  INTEGER provide_RT = 0 no R and no T in output. dR and dT are used as local workspace to store the R and T of each step. provide_RT = 1 the whole R of size (min(M,N), N) and the nbxnb block of T are provided in output. provide_RT = 2 the nbxnb diag block of R and of T are provided in output. 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
The matrix Q is represented as a product of elementary reflectors
Q = H(1) H(2) . . . H(k), where k = min(m,n).
Each H(i) has the form
H(i) = I  tau * v * v'
where tau is a real scalar, and v is a real vector with v(1:i1) = 0 and v(i) = 1; v(i+1:m) is stored on exit in A(i+1:m,i), and tau in TAU(i).
magma_int_t magma_sgeqrf_batched  (  magma_int_t  m, 
magma_int_t  n,  
float **  dA_array,  
magma_int_t  ldda,  
float **  dtau_array,  
magma_int_t *  info_array,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
SGEQRF computes a QR factorization of a real MbyN matrix A: A = Q * R.
[in]  m  INTEGER The number of rows of the matrix A. M >= 0. 
[in]  n  INTEGER The number of columns of the matrix A. N >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA,N) On entry, the MbyN matrix A. On exit, the elements on and above the diagonal of the array contain the min(M,N)byN upper trapezoidal matrix R (R is upper triangular if m >= n); the elements below the diagonal, with the array TAU, represent the orthogonal matrix Q as a product of min(m,n) elementary reflectors (see Further Details). 
[in]  ldda  INTEGER The leading dimension of the array dA. LDDA >= max(1,M). To benefit from coalescent memory accesses LDDA must be divisible by 16. 
[out]  dtau_array  Array of pointers, dimension (batchCount). Each is a REAL array, dimension (min(M,N)) The scalar factors of the elementary reflectors (see Further Details). 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
The matrix Q is represented as a product of elementary reflectors
Q = H(1) H(2) . . . H(k), where k = min(m,n).
Each H(i) has the form
H(i) = I  tau * v * v'
where tau is a real scalar, and v is a real vector with v(1:i1) = 0 and v(i) = 1; v(i+1:m) is stored on exit in A(i+1:m,i), and tau in TAU(i).
magma_int_t magma_sgeqrf_expert_batched  (  magma_int_t  m, 
magma_int_t  n,  
float **  dA_array,  
magma_int_t  ldda,  
float **  dR_array,  
magma_int_t  lddr,  
float **  dT_array,  
magma_int_t  lddt,  
float **  dtau_array,  
magma_int_t  provide_RT,  
magma_int_t *  info_array,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
SGEQRF computes a QR factorization of a real MbyN matrix A: A = Q * R.
[in]  m  INTEGER The number of rows of the matrix A. M >= 0. 
[in]  n  INTEGER The number of columns of the matrix A. N >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA,N) On entry, the MbyN matrix A. On exit, the elements on and above the diagonal of the array contain the min(M,N)byN upper trapezoidal matrix R (R is upper triangular if m >= n); the elements below the diagonal, with the array TAU, represent the orthogonal matrix Q as a product of min(m,n) elementary reflectors (see Further Details). 
[in]  ldda  INTEGER The leading dimension of the array dA. LDDA >= max(1,M). To benefit from coalescent memory accesses LDDA must be divisible by 16. 
[in,out]  dR_array  Array of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDR, N/NB) dR should be of size (LDDR, N) when provide_RT > 0 and of size (LDDT, NB) otherwise. NB is the local blocking size. On exit, the elements of R are stored in dR only when provide_RT > 0. 
[in]  lddr  INTEGER The leading dimension of the array dR. LDDR >= min(M,N) when provide_RT == 1 otherwise LDDR >= min(NB, min(M,N)). NB is the local blocking size. To benefit from coalescent memory accesses LDDR must be divisible by 16. 
[in,out]  dT_array  Array of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDT, N/NB) dT should be of size (LDDT, N) when provide_RT > 0 and of size (LDDT, NB) otherwise. NB is the local blocking size. On exit, the elements of T are stored in dT only when provide_RT > 0. 
[in]  lddt  INTEGER The leading dimension of the array dT. LDDT >= min(NB,min(M,N)). NB is the local blocking size. To benefit from coalescent memory accesses LDDR must be divisible by 16. 
[out]  dtau_array  Array of pointers, dimension (batchCount). Each is a REAL array, dimension (min(M,N)) The scalar factors of the elementary reflectors (see Further Details). 
[in]  provide_RT  INTEGER provide_RT = 0 no R and no T in output. dR and dT are used as local workspace to store the R and T of each step. provide_RT = 1 the whole R of size (min(M,N), N) and the nbxnb block of T are provided in output. provide_RT = 2 the nbxnb diag block of R and of T are provided in output. 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
The matrix Q is represented as a product of elementary reflectors
Q = H(1) H(2) . . . H(k), where k = min(m,n).
Each H(i) has the form
H(i) = I  tau * v * v'
where tau is a real scalar, and v is a real vector with v(1:i1) = 0 and v(i) = 1; v(i+1:m) is stored on exit in A(i+1:m,i), and tau in TAU(i).
magma_int_t magma_zgeqrf_batched  (  magma_int_t  m, 
magma_int_t  n,  
magmaDoubleComplex **  dA_array,  
magma_int_t  ldda,  
magmaDoubleComplex **  dtau_array,  
magma_int_t *  info_array,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
ZGEQRF computes a QR factorization of a complex MbyN matrix A: A = Q * R.
[in]  m  INTEGER The number of rows of the matrix A. M >= 0. 
[in]  n  INTEGER The number of columns of the matrix A. N >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA,N) On entry, the MbyN matrix A. On exit, the elements on and above the diagonal of the array contain the min(M,N)byN upper trapezoidal matrix R (R is upper triangular if m >= n); the elements below the diagonal, with the array TAU, represent the orthogonal matrix Q as a product of min(m,n) elementary reflectors (see Further Details). 
[in]  ldda  INTEGER The leading dimension of the array dA. LDDA >= max(1,M). To benefit from coalescent memory accesses LDDA must be divisible by 16. 
[out]  dtau_array  Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array, dimension (min(M,N)) The scalar factors of the elementary reflectors (see Further Details). 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
The matrix Q is represented as a product of elementary reflectors
Q = H(1) H(2) . . . H(k), where k = min(m,n).
Each H(i) has the form
H(i) = I  tau * v * v'
where tau is a complex scalar, and v is a complex vector with v(1:i1) = 0 and v(i) = 1; v(i+1:m) is stored on exit in A(i+1:m,i), and tau in TAU(i).
magma_int_t magma_zgeqrf_expert_batched  (  magma_int_t  m, 
magma_int_t  n,  
magmaDoubleComplex **  dA_array,  
magma_int_t  ldda,  
magmaDoubleComplex **  dR_array,  
magma_int_t  lddr,  
magmaDoubleComplex **  dT_array,  
magma_int_t  lddt,  
magmaDoubleComplex **  dtau_array,  
magma_int_t  provide_RT,  
magma_int_t *  info_array,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
ZGEQRF computes a QR factorization of a complex MbyN matrix A: A = Q * R.
[in]  m  INTEGER The number of rows of the matrix A. M >= 0. 
[in]  n  INTEGER The number of columns of the matrix A. N >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA,N) On entry, the MbyN matrix A. On exit, the elements on and above the diagonal of the array contain the min(M,N)byN upper trapezoidal matrix R (R is upper triangular if m >= n); the elements below the diagonal, with the array TAU, represent the orthogonal matrix Q as a product of min(m,n) elementary reflectors (see Further Details). 
[in]  ldda  INTEGER The leading dimension of the array dA. LDDA >= max(1,M). To benefit from coalescent memory accesses LDDA must be divisible by 16. 
[in,out]  dR_array  Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDR, N/NB) dR should be of size (LDDR, N) when provide_RT > 0 and of size (LDDT, NB) otherwise. NB is the local blocking size. On exit, the elements of R are stored in dR only when provide_RT > 0. 
[in]  lddr  INTEGER The leading dimension of the array dR. LDDR >= min(M,N) when provide_RT == 1 otherwise LDDR >= min(NB, min(M,N)). NB is the local blocking size. To benefit from coalescent memory accesses LDDR must be divisible by 16. 
[in,out]  dT_array  Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDT, N/NB) dT should be of size (LDDT, N) when provide_RT > 0 and of size (LDDT, NB) otherwise. NB is the local blocking size. On exit, the elements of T are stored in dT only when provide_RT > 0. 
[in]  lddt  INTEGER The leading dimension of the array dT. LDDT >= min(NB,min(M,N)). NB is the local blocking size. To benefit from coalescent memory accesses LDDR must be divisible by 16. 
[out]  dtau_array  Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array, dimension (min(M,N)) The scalar factors of the elementary reflectors (see Further Details). 
[in]  provide_RT  INTEGER provide_RT = 0 no R and no T in output. dR and dT are used as local workspace to store the R and T of each step. provide_RT = 1 the whole R of size (min(M,N), N) and the nbxnb block of T are provided in output. provide_RT = 2 the nbxnb diag block of R and of T are provided in output. 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
The matrix Q is represented as a product of elementary reflectors
Q = H(1) H(2) . . . H(k), where k = min(m,n).
Each H(i) has the form
H(i) = I  tau * v * v'
where tau is a complex scalar, and v is a complex vector with v(1:i1) = 0 and v(i) = 1; v(i+1:m) is stored on exit in A(i+1:m,i), and tau in TAU(i).
magma_int_t magma_cgeqrf_batched_smallsq  (  magma_int_t  n, 
magmaFloatComplex **  dA_array,  
magma_int_t  ldda,  
magmaFloatComplex **  dtau_array,  
magma_int_t *  info_array,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
CGEQRF computes a QR factorization of a complex MbyN matrix A: A = Q * R.
This is a batched version of the routine, and works only for small square matrices of size up to 32.
[in]  n  INTEGER The size of the matrix A. N >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA,N) On entry, the MbyN matrix A. On exit, the elements on and above the diagonal of the array contain the min(M,N)byN upper trapezoidal matrix R (R is upper triangular if m >= n); the elements below the diagonal, with the array TAU, represent the orthogonal matrix Q as a product of min(m,n) elementary reflectors (see Further Details). 
[in]  ldda  INTEGER The leading dimension of the array dA. LDDA >= max(1,M). To benefit from coalescent memory accesses LDDA must be divisible by 16. 
[out]  dtau_array  Array of pointers, dimension (batchCount). Each is a COMPLEX array, dimension (min(M,N)) The scalar factors of the elementary reflectors (see Further Details). 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
The matrix Q is represented as a product of elementary reflectors
Q = H(1) H(2) . . . H(k), where k = min(m,n).
Each H(i) has the form
H(i) = I  tau * v * v'
where tau is a complex scalar, and v is a complex vector with v(1:i1) = 0 and v(i) = 1; v(i+1:m) is stored on exit in A(i+1:m,i), and tau in TAU(i).
magma_int_t magma_dgeqrf_batched_smallsq  (  magma_int_t  n, 
double **  dA_array,  
magma_int_t  ldda,  
double **  dtau_array,  
magma_int_t *  info_array,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
DGEQRF computes a QR factorization of a real MbyN matrix A: A = Q * R.
This is a batched version of the routine, and works only for small square matrices of size up to 32.
[in]  n  INTEGER The size of the matrix A. N >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA,N) On entry, the MbyN matrix A. On exit, the elements on and above the diagonal of the array contain the min(M,N)byN upper trapezoidal matrix R (R is upper triangular if m >= n); the elements below the diagonal, with the array TAU, represent the orthogonal matrix Q as a product of min(m,n) elementary reflectors (see Further Details). 
[in]  ldda  INTEGER The leading dimension of the array dA. LDDA >= max(1,M). To benefit from coalescent memory accesses LDDA must be divisible by 16. 
[out]  dtau_array  Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array, dimension (min(M,N)) The scalar factors of the elementary reflectors (see Further Details). 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
The matrix Q is represented as a product of elementary reflectors
Q = H(1) H(2) . . . H(k), where k = min(m,n).
Each H(i) has the form
H(i) = I  tau * v * v'
where tau is a real scalar, and v is a real vector with v(1:i1) = 0 and v(i) = 1; v(i+1:m) is stored on exit in A(i+1:m,i), and tau in TAU(i).
magma_int_t magma_sgeqrf_batched_smallsq  (  magma_int_t  n, 
float **  dA_array,  
magma_int_t  ldda,  
float **  dtau_array,  
magma_int_t *  info_array,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
SGEQRF computes a QR factorization of a real MbyN matrix A: A = Q * R.
This is a batched version of the routine, and works only for small square matrices of size up to 32.
[in]  n  INTEGER The size of the matrix A. N >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA,N) On entry, the MbyN matrix A. On exit, the elements on and above the diagonal of the array contain the min(M,N)byN upper trapezoidal matrix R (R is upper triangular if m >= n); the elements below the diagonal, with the array TAU, represent the orthogonal matrix Q as a product of min(m,n) elementary reflectors (see Further Details). 
[in]  ldda  INTEGER The leading dimension of the array dA. LDDA >= max(1,M). To benefit from coalescent memory accesses LDDA must be divisible by 16. 
[out]  dtau_array  Array of pointers, dimension (batchCount). Each is a REAL array, dimension (min(M,N)) The scalar factors of the elementary reflectors (see Further Details). 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
The matrix Q is represented as a product of elementary reflectors
Q = H(1) H(2) . . . H(k), where k = min(m,n).
Each H(i) has the form
H(i) = I  tau * v * v'
where tau is a real scalar, and v is a real vector with v(1:i1) = 0 and v(i) = 1; v(i+1:m) is stored on exit in A(i+1:m,i), and tau in TAU(i).
magma_int_t magma_zgeqrf_batched_smallsq  (  magma_int_t  n, 
magmaDoubleComplex **  dA_array,  
magma_int_t  ldda,  
magmaDoubleComplex **  dtau_array,  
magma_int_t *  info_array,  
magma_int_t  batchCount,  
magma_queue_t  queue  
) 
ZGEQRF computes a QR factorization of a complex MbyN matrix A: A = Q * R.
This is a batched version of the routine, and works only for small square matrices of size up to 32.
[in]  n  INTEGER The size of the matrix A. N >= 0. 
[in,out]  dA_array  Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA,N) On entry, the MbyN matrix A. On exit, the elements on and above the diagonal of the array contain the min(M,N)byN upper trapezoidal matrix R (R is upper triangular if m >= n); the elements below the diagonal, with the array TAU, represent the orthogonal matrix Q as a product of min(m,n) elementary reflectors (see Further Details). 
[in]  ldda  INTEGER The leading dimension of the array dA. LDDA >= max(1,M). To benefit from coalescent memory accesses LDDA must be divisible by 16. 
[out]  dtau_array  Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array, dimension (min(M,N)) The scalar factors of the elementary reflectors (see Further Details). 
[out]  info_array  Array of INTEGERs, dimension (batchCount), for corresponding matrices.

[in]  batchCount  INTEGER The number of matrices to operate on. 
[in]  queue  magma_queue_t Queue to execute in. 
The matrix Q is represented as a product of elementary reflectors
Q = H(1) H(2) . . . H(k), where k = min(m,n).
Each H(i) has the form
H(i) = I  tau * v * v'
where tau is a complex scalar, and v is a complex vector with v(1:i1) = 0 and v(i) = 1; v(i+1:m) is stored on exit in A(i+1:m,i), and tau in TAU(i).