MAGMA  2.3.0 Matrix Algebra for GPU and Multicore Architectures
geqrf: QR factorization

## Functions

magma_int_t magma_cgeqrf_batched (magma_int_t m, magma_int_t n, magmaFloatComplex **dA_array, magma_int_t ldda, magmaFloatComplex **dtau_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
CGEQRF computes a QR factorization of a complex M-by-N matrix A: A = Q * R. More...

magma_int_t magma_cgeqrf_expert_batched (magma_int_t m, magma_int_t n, magmaFloatComplex **dA_array, magma_int_t ldda, magmaFloatComplex **dR_array, magma_int_t lddr, magmaFloatComplex **dT_array, magma_int_t lddt, magmaFloatComplex **dtau_array, magma_int_t provide_RT, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
CGEQRF computes a QR factorization of a complex M-by-N matrix A: A = Q * R. More...

magma_int_t magma_dgeqrf_batched (magma_int_t m, magma_int_t n, double **dA_array, magma_int_t ldda, double **dtau_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
DGEQRF computes a QR factorization of a real M-by-N matrix A: A = Q * R. More...

magma_int_t magma_dgeqrf_expert_batched (magma_int_t m, magma_int_t n, double **dA_array, magma_int_t ldda, double **dR_array, magma_int_t lddr, double **dT_array, magma_int_t lddt, double **dtau_array, magma_int_t provide_RT, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
DGEQRF computes a QR factorization of a real M-by-N matrix A: A = Q * R. More...

magma_int_t magma_sgeqrf_batched (magma_int_t m, magma_int_t n, float **dA_array, magma_int_t ldda, float **dtau_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
SGEQRF computes a QR factorization of a real M-by-N matrix A: A = Q * R. More...

magma_int_t magma_sgeqrf_expert_batched (magma_int_t m, magma_int_t n, float **dA_array, magma_int_t ldda, float **dR_array, magma_int_t lddr, float **dT_array, magma_int_t lddt, float **dtau_array, magma_int_t provide_RT, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
SGEQRF computes a QR factorization of a real M-by-N matrix A: A = Q * R. More...

magma_int_t magma_zgeqrf_batched (magma_int_t m, magma_int_t n, magmaDoubleComplex **dA_array, magma_int_t ldda, magmaDoubleComplex **dtau_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
ZGEQRF computes a QR factorization of a complex M-by-N matrix A: A = Q * R. More...

magma_int_t magma_zgeqrf_expert_batched (magma_int_t m, magma_int_t n, magmaDoubleComplex **dA_array, magma_int_t ldda, magmaDoubleComplex **dR_array, magma_int_t lddr, magmaDoubleComplex **dT_array, magma_int_t lddt, magmaDoubleComplex **dtau_array, magma_int_t provide_RT, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
ZGEQRF computes a QR factorization of a complex M-by-N matrix A: A = Q * R. More...

magma_int_t magma_cgeqrf_batched_smallsq (magma_int_t n, magmaFloatComplex **dA_array, magma_int_t ldda, magmaFloatComplex **dtau_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
CGEQRF computes a QR factorization of a complex M-by-N matrix A: A = Q * R. More...

magma_int_t magma_dgeqrf_batched_smallsq (magma_int_t n, double **dA_array, magma_int_t ldda, double **dtau_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
DGEQRF computes a QR factorization of a real M-by-N matrix A: A = Q * R. More...

magma_int_t magma_sgeqrf_batched_smallsq (magma_int_t n, float **dA_array, magma_int_t ldda, float **dtau_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
SGEQRF computes a QR factorization of a real M-by-N matrix A: A = Q * R. More...

magma_int_t magma_zgeqrf_batched_smallsq (magma_int_t n, magmaDoubleComplex **dA_array, magma_int_t ldda, magmaDoubleComplex **dtau_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
ZGEQRF computes a QR factorization of a complex M-by-N matrix A: A = Q * R. More...

## Function Documentation

 magma_int_t magma_cgeqrf_batched ( magma_int_t m, magma_int_t n, magmaFloatComplex ** dA_array, magma_int_t ldda, magmaFloatComplex ** dtau_array, magma_int_t * info_array, magma_int_t batchCount, magma_queue_t queue )

CGEQRF computes a QR factorization of a complex M-by-N matrix A: A = Q * R.

Parameters
 [in] m INTEGER The number of rows of the matrix A. M >= 0. [in] n INTEGER The number of columns of the matrix A. N >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA,N) On entry, the M-by-N matrix A. On exit, the elements on and above the diagonal of the array contain the min(M,N)-by-N upper trapezoidal matrix R (R is upper triangular if m >= n); the elements below the diagonal, with the array TAU, represent the orthogonal matrix Q as a product of min(m,n) elementary reflectors (see Further Details). [in] ldda INTEGER The leading dimension of the array dA. LDDA >= max(1,M). To benefit from coalescent memory accesses LDDA must be divisible by 16. [out] dtau_array Array of pointers, dimension (batchCount). Each is a COMPLEX array, dimension (min(M,N)) The scalar factors of the elementary reflectors (see Further Details). [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.

## Further Details

The matrix Q is represented as a product of elementary reflectors

Q = H(1) H(2) . . . H(k), where k = min(m,n).

Each H(i) has the form

H(i) = I - tau * v * v'

where tau is a complex scalar, and v is a complex vector with v(1:i-1) = 0 and v(i) = 1; v(i+1:m) is stored on exit in A(i+1:m,i), and tau in TAU(i).

 magma_int_t magma_cgeqrf_expert_batched ( magma_int_t m, magma_int_t n, magmaFloatComplex ** dA_array, magma_int_t ldda, magmaFloatComplex ** dR_array, magma_int_t lddr, magmaFloatComplex ** dT_array, magma_int_t lddt, magmaFloatComplex ** dtau_array, magma_int_t provide_RT, magma_int_t * info_array, magma_int_t batchCount, magma_queue_t queue )

CGEQRF computes a QR factorization of a complex M-by-N matrix A: A = Q * R.

Parameters
 [in] m INTEGER The number of rows of the matrix A. M >= 0. [in] n INTEGER The number of columns of the matrix A. N >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA,N) On entry, the M-by-N matrix A. On exit, the elements on and above the diagonal of the array contain the min(M,N)-by-N upper trapezoidal matrix R (R is upper triangular if m >= n); the elements below the diagonal, with the array TAU, represent the orthogonal matrix Q as a product of min(m,n) elementary reflectors (see Further Details). [in] ldda INTEGER The leading dimension of the array dA. LDDA >= max(1,M). To benefit from coalescent memory accesses LDDA must be divisible by 16. [in,out] dR_array Array of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDR, N/NB) dR should be of size (LDDR, N) when provide_RT > 0 and of size (LDDT, NB) otherwise. NB is the local blocking size. On exit, the elements of R are stored in dR only when provide_RT > 0. [in] lddr INTEGER The leading dimension of the array dR. LDDR >= min(M,N) when provide_RT == 1 otherwise LDDR >= min(NB, min(M,N)). NB is the local blocking size. To benefit from coalescent memory accesses LDDR must be divisible by 16. [in,out] dT_array Array of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDT, N/NB) dT should be of size (LDDT, N) when provide_RT > 0 and of size (LDDT, NB) otherwise. NB is the local blocking size. On exit, the elements of T are stored in dT only when provide_RT > 0. [in] lddt INTEGER The leading dimension of the array dT. LDDT >= min(NB,min(M,N)). NB is the local blocking size. To benefit from coalescent memory accesses LDDR must be divisible by 16. [out] dtau_array Array of pointers, dimension (batchCount). Each is a COMPLEX array, dimension (min(M,N)) The scalar factors of the elementary reflectors (see Further Details). [in] provide_RT INTEGER provide_RT = 0 no R and no T in output. dR and dT are used as local workspace to store the R and T of each step. provide_RT = 1 the whole R of size (min(M,N), N) and the nbxnb block of T are provided in output. provide_RT = 2 the nbxnb diag block of R and of T are provided in output. [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.

## Further Details

The matrix Q is represented as a product of elementary reflectors

Q = H(1) H(2) . . . H(k), where k = min(m,n).

Each H(i) has the form

H(i) = I - tau * v * v'

where tau is a complex scalar, and v is a complex vector with v(1:i-1) = 0 and v(i) = 1; v(i+1:m) is stored on exit in A(i+1:m,i), and tau in TAU(i).

 magma_int_t magma_dgeqrf_batched ( magma_int_t m, magma_int_t n, double ** dA_array, magma_int_t ldda, double ** dtau_array, magma_int_t * info_array, magma_int_t batchCount, magma_queue_t queue )

DGEQRF computes a QR factorization of a real M-by-N matrix A: A = Q * R.

Parameters
 [in] m INTEGER The number of rows of the matrix A. M >= 0. [in] n INTEGER The number of columns of the matrix A. N >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA,N) On entry, the M-by-N matrix A. On exit, the elements on and above the diagonal of the array contain the min(M,N)-by-N upper trapezoidal matrix R (R is upper triangular if m >= n); the elements below the diagonal, with the array TAU, represent the orthogonal matrix Q as a product of min(m,n) elementary reflectors (see Further Details). [in] ldda INTEGER The leading dimension of the array dA. LDDA >= max(1,M). To benefit from coalescent memory accesses LDDA must be divisible by 16. [out] dtau_array Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array, dimension (min(M,N)) The scalar factors of the elementary reflectors (see Further Details). [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.

## Further Details

The matrix Q is represented as a product of elementary reflectors

Q = H(1) H(2) . . . H(k), where k = min(m,n).

Each H(i) has the form

H(i) = I - tau * v * v'

where tau is a real scalar, and v is a real vector with v(1:i-1) = 0 and v(i) = 1; v(i+1:m) is stored on exit in A(i+1:m,i), and tau in TAU(i).

 magma_int_t magma_dgeqrf_expert_batched ( magma_int_t m, magma_int_t n, double ** dA_array, magma_int_t ldda, double ** dR_array, magma_int_t lddr, double ** dT_array, magma_int_t lddt, double ** dtau_array, magma_int_t provide_RT, magma_int_t * info_array, magma_int_t batchCount, magma_queue_t queue )

DGEQRF computes a QR factorization of a real M-by-N matrix A: A = Q * R.

Parameters
 [in] m INTEGER The number of rows of the matrix A. M >= 0. [in] n INTEGER The number of columns of the matrix A. N >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA,N) On entry, the M-by-N matrix A. On exit, the elements on and above the diagonal of the array contain the min(M,N)-by-N upper trapezoidal matrix R (R is upper triangular if m >= n); the elements below the diagonal, with the array TAU, represent the orthogonal matrix Q as a product of min(m,n) elementary reflectors (see Further Details). [in] ldda INTEGER The leading dimension of the array dA. LDDA >= max(1,M). To benefit from coalescent memory accesses LDDA must be divisible by 16. [in,out] dR_array Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDR, N/NB) dR should be of size (LDDR, N) when provide_RT > 0 and of size (LDDT, NB) otherwise. NB is the local blocking size. On exit, the elements of R are stored in dR only when provide_RT > 0. [in] lddr INTEGER The leading dimension of the array dR. LDDR >= min(M,N) when provide_RT == 1 otherwise LDDR >= min(NB, min(M,N)). NB is the local blocking size. To benefit from coalescent memory accesses LDDR must be divisible by 16. [in,out] dT_array Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDT, N/NB) dT should be of size (LDDT, N) when provide_RT > 0 and of size (LDDT, NB) otherwise. NB is the local blocking size. On exit, the elements of T are stored in dT only when provide_RT > 0. [in] lddt INTEGER The leading dimension of the array dT. LDDT >= min(NB,min(M,N)). NB is the local blocking size. To benefit from coalescent memory accesses LDDR must be divisible by 16. [out] dtau_array Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array, dimension (min(M,N)) The scalar factors of the elementary reflectors (see Further Details). [in] provide_RT INTEGER provide_RT = 0 no R and no T in output. dR and dT are used as local workspace to store the R and T of each step. provide_RT = 1 the whole R of size (min(M,N), N) and the nbxnb block of T are provided in output. provide_RT = 2 the nbxnb diag block of R and of T are provided in output. [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.

## Further Details

The matrix Q is represented as a product of elementary reflectors

Q = H(1) H(2) . . . H(k), where k = min(m,n).

Each H(i) has the form

H(i) = I - tau * v * v'

where tau is a real scalar, and v is a real vector with v(1:i-1) = 0 and v(i) = 1; v(i+1:m) is stored on exit in A(i+1:m,i), and tau in TAU(i).

 magma_int_t magma_sgeqrf_batched ( magma_int_t m, magma_int_t n, float ** dA_array, magma_int_t ldda, float ** dtau_array, magma_int_t * info_array, magma_int_t batchCount, magma_queue_t queue )

SGEQRF computes a QR factorization of a real M-by-N matrix A: A = Q * R.

Parameters
 [in] m INTEGER The number of rows of the matrix A. M >= 0. [in] n INTEGER The number of columns of the matrix A. N >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA,N) On entry, the M-by-N matrix A. On exit, the elements on and above the diagonal of the array contain the min(M,N)-by-N upper trapezoidal matrix R (R is upper triangular if m >= n); the elements below the diagonal, with the array TAU, represent the orthogonal matrix Q as a product of min(m,n) elementary reflectors (see Further Details). [in] ldda INTEGER The leading dimension of the array dA. LDDA >= max(1,M). To benefit from coalescent memory accesses LDDA must be divisible by 16. [out] dtau_array Array of pointers, dimension (batchCount). Each is a REAL array, dimension (min(M,N)) The scalar factors of the elementary reflectors (see Further Details). [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.

## Further Details

The matrix Q is represented as a product of elementary reflectors

Q = H(1) H(2) . . . H(k), where k = min(m,n).

Each H(i) has the form

H(i) = I - tau * v * v'

where tau is a real scalar, and v is a real vector with v(1:i-1) = 0 and v(i) = 1; v(i+1:m) is stored on exit in A(i+1:m,i), and tau in TAU(i).

 magma_int_t magma_sgeqrf_expert_batched ( magma_int_t m, magma_int_t n, float ** dA_array, magma_int_t ldda, float ** dR_array, magma_int_t lddr, float ** dT_array, magma_int_t lddt, float ** dtau_array, magma_int_t provide_RT, magma_int_t * info_array, magma_int_t batchCount, magma_queue_t queue )

SGEQRF computes a QR factorization of a real M-by-N matrix A: A = Q * R.

Parameters
 [in] m INTEGER The number of rows of the matrix A. M >= 0. [in] n INTEGER The number of columns of the matrix A. N >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA,N) On entry, the M-by-N matrix A. On exit, the elements on and above the diagonal of the array contain the min(M,N)-by-N upper trapezoidal matrix R (R is upper triangular if m >= n); the elements below the diagonal, with the array TAU, represent the orthogonal matrix Q as a product of min(m,n) elementary reflectors (see Further Details). [in] ldda INTEGER The leading dimension of the array dA. LDDA >= max(1,M). To benefit from coalescent memory accesses LDDA must be divisible by 16. [in,out] dR_array Array of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDR, N/NB) dR should be of size (LDDR, N) when provide_RT > 0 and of size (LDDT, NB) otherwise. NB is the local blocking size. On exit, the elements of R are stored in dR only when provide_RT > 0. [in] lddr INTEGER The leading dimension of the array dR. LDDR >= min(M,N) when provide_RT == 1 otherwise LDDR >= min(NB, min(M,N)). NB is the local blocking size. To benefit from coalescent memory accesses LDDR must be divisible by 16. [in,out] dT_array Array of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDT, N/NB) dT should be of size (LDDT, N) when provide_RT > 0 and of size (LDDT, NB) otherwise. NB is the local blocking size. On exit, the elements of T are stored in dT only when provide_RT > 0. [in] lddt INTEGER The leading dimension of the array dT. LDDT >= min(NB,min(M,N)). NB is the local blocking size. To benefit from coalescent memory accesses LDDR must be divisible by 16. [out] dtau_array Array of pointers, dimension (batchCount). Each is a REAL array, dimension (min(M,N)) The scalar factors of the elementary reflectors (see Further Details). [in] provide_RT INTEGER provide_RT = 0 no R and no T in output. dR and dT are used as local workspace to store the R and T of each step. provide_RT = 1 the whole R of size (min(M,N), N) and the nbxnb block of T are provided in output. provide_RT = 2 the nbxnb diag block of R and of T are provided in output. [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.

## Further Details

The matrix Q is represented as a product of elementary reflectors

Q = H(1) H(2) . . . H(k), where k = min(m,n).

Each H(i) has the form

H(i) = I - tau * v * v'

where tau is a real scalar, and v is a real vector with v(1:i-1) = 0 and v(i) = 1; v(i+1:m) is stored on exit in A(i+1:m,i), and tau in TAU(i).

 magma_int_t magma_zgeqrf_batched ( magma_int_t m, magma_int_t n, magmaDoubleComplex ** dA_array, magma_int_t ldda, magmaDoubleComplex ** dtau_array, magma_int_t * info_array, magma_int_t batchCount, magma_queue_t queue )

ZGEQRF computes a QR factorization of a complex M-by-N matrix A: A = Q * R.

Parameters
 [in] m INTEGER The number of rows of the matrix A. M >= 0. [in] n INTEGER The number of columns of the matrix A. N >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA,N) On entry, the M-by-N matrix A. On exit, the elements on and above the diagonal of the array contain the min(M,N)-by-N upper trapezoidal matrix R (R is upper triangular if m >= n); the elements below the diagonal, with the array TAU, represent the orthogonal matrix Q as a product of min(m,n) elementary reflectors (see Further Details). [in] ldda INTEGER The leading dimension of the array dA. LDDA >= max(1,M). To benefit from coalescent memory accesses LDDA must be divisible by 16. [out] dtau_array Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array, dimension (min(M,N)) The scalar factors of the elementary reflectors (see Further Details). [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.

## Further Details

The matrix Q is represented as a product of elementary reflectors

Q = H(1) H(2) . . . H(k), where k = min(m,n).

Each H(i) has the form

H(i) = I - tau * v * v'

where tau is a complex scalar, and v is a complex vector with v(1:i-1) = 0 and v(i) = 1; v(i+1:m) is stored on exit in A(i+1:m,i), and tau in TAU(i).

 magma_int_t magma_zgeqrf_expert_batched ( magma_int_t m, magma_int_t n, magmaDoubleComplex ** dA_array, magma_int_t ldda, magmaDoubleComplex ** dR_array, magma_int_t lddr, magmaDoubleComplex ** dT_array, magma_int_t lddt, magmaDoubleComplex ** dtau_array, magma_int_t provide_RT, magma_int_t * info_array, magma_int_t batchCount, magma_queue_t queue )

ZGEQRF computes a QR factorization of a complex M-by-N matrix A: A = Q * R.

Parameters
 [in] m INTEGER The number of rows of the matrix A. M >= 0. [in] n INTEGER The number of columns of the matrix A. N >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA,N) On entry, the M-by-N matrix A. On exit, the elements on and above the diagonal of the array contain the min(M,N)-by-N upper trapezoidal matrix R (R is upper triangular if m >= n); the elements below the diagonal, with the array TAU, represent the orthogonal matrix Q as a product of min(m,n) elementary reflectors (see Further Details). [in] ldda INTEGER The leading dimension of the array dA. LDDA >= max(1,M). To benefit from coalescent memory accesses LDDA must be divisible by 16. [in,out] dR_array Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDR, N/NB) dR should be of size (LDDR, N) when provide_RT > 0 and of size (LDDT, NB) otherwise. NB is the local blocking size. On exit, the elements of R are stored in dR only when provide_RT > 0. [in] lddr INTEGER The leading dimension of the array dR. LDDR >= min(M,N) when provide_RT == 1 otherwise LDDR >= min(NB, min(M,N)). NB is the local blocking size. To benefit from coalescent memory accesses LDDR must be divisible by 16. [in,out] dT_array Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDT, N/NB) dT should be of size (LDDT, N) when provide_RT > 0 and of size (LDDT, NB) otherwise. NB is the local blocking size. On exit, the elements of T are stored in dT only when provide_RT > 0. [in] lddt INTEGER The leading dimension of the array dT. LDDT >= min(NB,min(M,N)). NB is the local blocking size. To benefit from coalescent memory accesses LDDR must be divisible by 16. [out] dtau_array Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array, dimension (min(M,N)) The scalar factors of the elementary reflectors (see Further Details). [in] provide_RT INTEGER provide_RT = 0 no R and no T in output. dR and dT are used as local workspace to store the R and T of each step. provide_RT = 1 the whole R of size (min(M,N), N) and the nbxnb block of T are provided in output. provide_RT = 2 the nbxnb diag block of R and of T are provided in output. [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.

## Further Details

The matrix Q is represented as a product of elementary reflectors

Q = H(1) H(2) . . . H(k), where k = min(m,n).

Each H(i) has the form

H(i) = I - tau * v * v'

where tau is a complex scalar, and v is a complex vector with v(1:i-1) = 0 and v(i) = 1; v(i+1:m) is stored on exit in A(i+1:m,i), and tau in TAU(i).

 magma_int_t magma_cgeqrf_batched_smallsq ( magma_int_t n, magmaFloatComplex ** dA_array, magma_int_t ldda, magmaFloatComplex ** dtau_array, magma_int_t * info_array, magma_int_t batchCount, magma_queue_t queue )

CGEQRF computes a QR factorization of a complex M-by-N matrix A: A = Q * R.

This is a batched version of the routine, and works only for small square matrices of size up to 32.

Parameters
 [in] n INTEGER The size of the matrix A. N >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA,N) On entry, the M-by-N matrix A. On exit, the elements on and above the diagonal of the array contain the min(M,N)-by-N upper trapezoidal matrix R (R is upper triangular if m >= n); the elements below the diagonal, with the array TAU, represent the orthogonal matrix Q as a product of min(m,n) elementary reflectors (see Further Details). [in] ldda INTEGER The leading dimension of the array dA. LDDA >= max(1,M). To benefit from coalescent memory accesses LDDA must be divisible by 16. [out] dtau_array Array of pointers, dimension (batchCount). Each is a COMPLEX array, dimension (min(M,N)) The scalar factors of the elementary reflectors (see Further Details). [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.

## Further Details

The matrix Q is represented as a product of elementary reflectors

Q = H(1) H(2) . . . H(k), where k = min(m,n).

Each H(i) has the form

H(i) = I - tau * v * v'

where tau is a complex scalar, and v is a complex vector with v(1:i-1) = 0 and v(i) = 1; v(i+1:m) is stored on exit in A(i+1:m,i), and tau in TAU(i).

 magma_int_t magma_dgeqrf_batched_smallsq ( magma_int_t n, double ** dA_array, magma_int_t ldda, double ** dtau_array, magma_int_t * info_array, magma_int_t batchCount, magma_queue_t queue )

DGEQRF computes a QR factorization of a real M-by-N matrix A: A = Q * R.

This is a batched version of the routine, and works only for small square matrices of size up to 32.

Parameters
 [in] n INTEGER The size of the matrix A. N >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA,N) On entry, the M-by-N matrix A. On exit, the elements on and above the diagonal of the array contain the min(M,N)-by-N upper trapezoidal matrix R (R is upper triangular if m >= n); the elements below the diagonal, with the array TAU, represent the orthogonal matrix Q as a product of min(m,n) elementary reflectors (see Further Details). [in] ldda INTEGER The leading dimension of the array dA. LDDA >= max(1,M). To benefit from coalescent memory accesses LDDA must be divisible by 16. [out] dtau_array Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array, dimension (min(M,N)) The scalar factors of the elementary reflectors (see Further Details). [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.

## Further Details

The matrix Q is represented as a product of elementary reflectors

Q = H(1) H(2) . . . H(k), where k = min(m,n).

Each H(i) has the form

H(i) = I - tau * v * v'

where tau is a real scalar, and v is a real vector with v(1:i-1) = 0 and v(i) = 1; v(i+1:m) is stored on exit in A(i+1:m,i), and tau in TAU(i).

 magma_int_t magma_sgeqrf_batched_smallsq ( magma_int_t n, float ** dA_array, magma_int_t ldda, float ** dtau_array, magma_int_t * info_array, magma_int_t batchCount, magma_queue_t queue )

SGEQRF computes a QR factorization of a real M-by-N matrix A: A = Q * R.

This is a batched version of the routine, and works only for small square matrices of size up to 32.

Parameters
 [in] n INTEGER The size of the matrix A. N >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA,N) On entry, the M-by-N matrix A. On exit, the elements on and above the diagonal of the array contain the min(M,N)-by-N upper trapezoidal matrix R (R is upper triangular if m >= n); the elements below the diagonal, with the array TAU, represent the orthogonal matrix Q as a product of min(m,n) elementary reflectors (see Further Details). [in] ldda INTEGER The leading dimension of the array dA. LDDA >= max(1,M). To benefit from coalescent memory accesses LDDA must be divisible by 16. [out] dtau_array Array of pointers, dimension (batchCount). Each is a REAL array, dimension (min(M,N)) The scalar factors of the elementary reflectors (see Further Details). [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.

## Further Details

The matrix Q is represented as a product of elementary reflectors

Q = H(1) H(2) . . . H(k), where k = min(m,n).

Each H(i) has the form

H(i) = I - tau * v * v'

where tau is a real scalar, and v is a real vector with v(1:i-1) = 0 and v(i) = 1; v(i+1:m) is stored on exit in A(i+1:m,i), and tau in TAU(i).

 magma_int_t magma_zgeqrf_batched_smallsq ( magma_int_t n, magmaDoubleComplex ** dA_array, magma_int_t ldda, magmaDoubleComplex ** dtau_array, magma_int_t * info_array, magma_int_t batchCount, magma_queue_t queue )

ZGEQRF computes a QR factorization of a complex M-by-N matrix A: A = Q * R.

This is a batched version of the routine, and works only for small square matrices of size up to 32.

Parameters
 [in] n INTEGER The size of the matrix A. N >= 0. [in,out] dA_array Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA,N) On entry, the M-by-N matrix A. On exit, the elements on and above the diagonal of the array contain the min(M,N)-by-N upper trapezoidal matrix R (R is upper triangular if m >= n); the elements below the diagonal, with the array TAU, represent the orthogonal matrix Q as a product of min(m,n) elementary reflectors (see Further Details). [in] ldda INTEGER The leading dimension of the array dA. LDDA >= max(1,M). To benefit from coalescent memory accesses LDDA must be divisible by 16. [out] dtau_array Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array, dimension (min(M,N)) The scalar factors of the elementary reflectors (see Further Details). [out] info_array Array of INTEGERs, dimension (batchCount), for corresponding matrices. = 0: successful exit [in] batchCount INTEGER The number of matrices to operate on. [in] queue magma_queue_t Queue to execute in.

## Further Details

The matrix Q is represented as a product of elementary reflectors

Q = H(1) H(2) . . . H(k), where k = min(m,n).

Each H(i) has the form

H(i) = I - tau * v * v'

where tau is a complex scalar, and v is a complex vector with v(1:i-1) = 0 and v(i) = 1; v(i+1:m) is stored on exit in A(i+1:m,i), and tau in TAU(i).