MAGMA  2.3.0
Matrix Algebra for GPU and Multicore Architectures
 All Classes Files Functions Friends Groups Pages
lahr2: Partial factorization; used by gehrd

Functions

magma_int_t magma_clahr2 (magma_int_t n, magma_int_t k, magma_int_t nb, magmaFloatComplex_ptr dA, magma_int_t ldda, magmaFloatComplex_ptr dV, magma_int_t lddv, magmaFloatComplex *A, magma_int_t lda, magmaFloatComplex *tau, magmaFloatComplex *T, magma_int_t ldt, magmaFloatComplex *Y, magma_int_t ldy, magma_queue_t queue)
 CLAHR2 reduces the first NB columns of a complex general n-BY-(n-k+1) matrix A so that elements below the k-th subdiagonal are zero. More...
 
magma_int_t magma_clahr2_m (magma_int_t n, magma_int_t k, magma_int_t nb, magmaFloatComplex *A, magma_int_t lda, magmaFloatComplex *tau, magmaFloatComplex *T, magma_int_t ldt, magmaFloatComplex *Y, magma_int_t ldy, struct cgehrd_data *data)
 CLAHR2 reduces the first NB columns of a complex general n-BY-(n-k+1) matrix A so that elements below the k-th subdiagonal are zero. More...
 
magma_int_t magma_dlahr2 (magma_int_t n, magma_int_t k, magma_int_t nb, magmaDouble_ptr dA, magma_int_t ldda, magmaDouble_ptr dV, magma_int_t lddv, double *A, magma_int_t lda, double *tau, double *T, magma_int_t ldt, double *Y, magma_int_t ldy, magma_queue_t queue)
 DLAHR2 reduces the first NB columns of a real general n-BY-(n-k+1) matrix A so that elements below the k-th subdiagonal are zero. More...
 
magma_int_t magma_dlahr2_m (magma_int_t n, magma_int_t k, magma_int_t nb, double *A, magma_int_t lda, double *tau, double *T, magma_int_t ldt, double *Y, magma_int_t ldy, struct dgehrd_data *data)
 DLAHR2 reduces the first NB columns of a real general n-BY-(n-k+1) matrix A so that elements below the k-th subdiagonal are zero. More...
 
magma_int_t magma_slahr2 (magma_int_t n, magma_int_t k, magma_int_t nb, magmaFloat_ptr dA, magma_int_t ldda, magmaFloat_ptr dV, magma_int_t lddv, float *A, magma_int_t lda, float *tau, float *T, magma_int_t ldt, float *Y, magma_int_t ldy, magma_queue_t queue)
 SLAHR2 reduces the first NB columns of a real general n-BY-(n-k+1) matrix A so that elements below the k-th subdiagonal are zero. More...
 
magma_int_t magma_slahr2_m (magma_int_t n, magma_int_t k, magma_int_t nb, float *A, magma_int_t lda, float *tau, float *T, magma_int_t ldt, float *Y, magma_int_t ldy, struct sgehrd_data *data)
 SLAHR2 reduces the first NB columns of a real general n-BY-(n-k+1) matrix A so that elements below the k-th subdiagonal are zero. More...
 
magma_int_t magma_zlahr2 (magma_int_t n, magma_int_t k, magma_int_t nb, magmaDoubleComplex_ptr dA, magma_int_t ldda, magmaDoubleComplex_ptr dV, magma_int_t lddv, magmaDoubleComplex *A, magma_int_t lda, magmaDoubleComplex *tau, magmaDoubleComplex *T, magma_int_t ldt, magmaDoubleComplex *Y, magma_int_t ldy, magma_queue_t queue)
 ZLAHR2 reduces the first NB columns of a complex general n-BY-(n-k+1) matrix A so that elements below the k-th subdiagonal are zero. More...
 
magma_int_t magma_zlahr2_m (magma_int_t n, magma_int_t k, magma_int_t nb, magmaDoubleComplex *A, magma_int_t lda, magmaDoubleComplex *tau, magmaDoubleComplex *T, magma_int_t ldt, magmaDoubleComplex *Y, magma_int_t ldy, struct zgehrd_data *data)
 ZLAHR2 reduces the first NB columns of a complex general n-BY-(n-k+1) matrix A so that elements below the k-th subdiagonal are zero. More...
 

Detailed Description

Function Documentation

magma_int_t magma_clahr2 ( magma_int_t  n,
magma_int_t  k,
magma_int_t  nb,
magmaFloatComplex_ptr  dA,
magma_int_t  ldda,
magmaFloatComplex_ptr  dV,
magma_int_t  lddv,
magmaFloatComplex *  A,
magma_int_t  lda,
magmaFloatComplex *  tau,
magmaFloatComplex *  T,
magma_int_t  ldt,
magmaFloatComplex *  Y,
magma_int_t  ldy,
magma_queue_t  queue 
)

CLAHR2 reduces the first NB columns of a complex general n-BY-(n-k+1) matrix A so that elements below the k-th subdiagonal are zero.

The reduction is performed by an orthogonal similarity transformation Q' * A * Q. The routine returns the matrices V and T which determine Q as a block reflector I - V*T*V', and also the matrix Y = A * V. (Note this is different than LAPACK, which computes Y = A * V * T.)

This is an auxiliary routine called by CGEHRD.

Parameters
[in]nINTEGER The order of the matrix A.
[in]kINTEGER The offset for the reduction. Elements below the k-th subdiagonal in the first NB columns are reduced to zero. K < N.
[in]nbINTEGER The number of columns to be reduced.
[in,out]dACOMPLEX array on the GPU, dimension (LDDA,N-K+1) On entry, the n-by-(n-k+1) general matrix A. On exit, the elements in rows K:N of the first NB columns are overwritten with the matrix Y.
[in]lddaINTEGER The leading dimension of the array dA. LDDA >= max(1,N).
[out]dVCOMPLEX array on the GPU, dimension (LDDV, NB) On exit this n-by-nb array contains the Householder vectors of the transformation.
[in]lddvINTEGER The leading dimension of the array dV. LDDV >= max(1,N).
[in,out]ACOMPLEX array, dimension (LDA,N-K+1) On entry, the n-by-(n-k+1) general matrix A. On exit, the elements on and above the k-th subdiagonal in the first NB columns are overwritten with the corresponding elements of the reduced matrix; the elements below the k-th subdiagonal, with the array TAU, represent the matrix Q as a product of elementary reflectors. The other columns of A are unchanged. See Further Details.
[in]ldaINTEGER The leading dimension of the array A. LDA >= max(1,N).
[out]tauCOMPLEX array, dimension (NB) The scalar factors of the elementary reflectors. See Further Details.
[out]TCOMPLEX array, dimension (LDT,NB) The upper triangular matrix T.
[in]ldtINTEGER The leading dimension of the array T. LDT >= NB.
[out]YCOMPLEX array, dimension (LDY,NB) The n-by-nb matrix Y.
[in]ldyINTEGER The leading dimension of the array Y. LDY >= N.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The matrix Q is represented as a product of nb elementary reflectors

Q = H(1) H(2) . . . H(nb).

Each H(i) has the form

H(i) = I - tau * v * v'

where tau is a complex scalar, and v is a complex vector with v(1:i+k-1) = 0, v(i+k) = 1; v(i+k+1:n) is stored on exit in A(i+k+1:n,i), and tau in TAU(i).

The elements of the vectors v together form the (n-k+1)-by-nb matrix V which is needed, with T and Y, to apply the transformation to the unreduced part of the matrix, using an update of the form: A := (I - V*T*V') * (A - Y*T*V').

The contents of A on exit are illustrated by the following example with n = 7, k = 3 and nb = 2:

   ( a   a   a   a   a )
   ( a   a   a   a   a )
   ( a   a   a   a   a )
   ( h   h   a   a   a )
   ( v1  h   a   a   a )
   ( v1  v2  a   a   a )
   ( v1  v2  a   a   a )

where "a" denotes an element of the original matrix A, h denotes a modified element of the upper Hessenberg matrix H, and vi denotes an element of the vector defining H(i).

This implementation follows the hybrid algorithm and notations described in

S. Tomov and J. Dongarra, "Accelerating the reduction to upper Hessenberg form through hybrid GPU-based computing," University of Tennessee Computer Science Technical Report, UT-CS-09-642 (also LAPACK Working Note 219), May 24, 2009.

magma_int_t magma_clahr2_m ( magma_int_t  n,
magma_int_t  k,
magma_int_t  nb,
magmaFloatComplex *  A,
magma_int_t  lda,
magmaFloatComplex *  tau,
magmaFloatComplex *  T,
magma_int_t  ldt,
magmaFloatComplex *  Y,
magma_int_t  ldy,
struct cgehrd_data data 
)

CLAHR2 reduces the first NB columns of a complex general n-BY-(n-k+1) matrix A so that elements below the k-th subdiagonal are zero.

The reduction is performed by an orthogonal similarity transformation Q' * A * Q. The routine returns the matrices V and T which determine Q as a block reflector I - V*T*V', and also the matrix Y = A * V. (Note this is different than LAPACK, which computes Y = A * V * T.)

This is an auxiliary routine called by CGEHRD.

Parameters
[in]nINTEGER The order of the matrix A.
[in]kINTEGER The offset for the reduction. Elements below the k-th subdiagonal in the first NB columns are reduced to zero. K < N.
[in]nbINTEGER The number of columns to be reduced.
[in,out]ACOMPLEX array, dimension (LDA,N-K+1) On entry, the n-by-(n-k+1) general matrix A. On exit, the elements on and above the k-th subdiagonal in the first NB columns are overwritten with the corresponding elements of the reduced matrix; the elements below the k-th subdiagonal, with the array TAU, represent the matrix Q as a product of elementary reflectors. The other columns of A are unchanged. See Further Details.
[in]ldaINTEGER The leading dimension of the array A. LDA >= max(1,N).
[out]tauCOMPLEX array, dimension (NB) The scalar factors of the elementary reflectors. See Further Details.
[out]TCOMPLEX array, dimension (LDT,NB) The upper triangular matrix T.
[in]ldtINTEGER The leading dimension of the array T. LDT >= NB.
[out]YCOMPLEX array, dimension (LDY,NB) The n-by-nb matrix Y.
[in]ldyINTEGER The leading dimension of the array Y. LDY >= N.
[in,out]dataStructure with pointers to dA, dT, dV, dW, dY which are distributed across multiple GPUs.

Further Details

The matrix Q is represented as a product of nb elementary reflectors

Q = H(1) H(2) . . . H(nb).

Each H(i) has the form

H(i) = I - tau * v * v'

where tau is a complex scalar, and v is a complex vector with v(1:i+k-1) = 0, v(i+k) = 1; v(i+k+1:n) is stored on exit in A(i+k+1:n,i), and tau in TAU(i).

The elements of the vectors v together form the (n-k+1)-by-nb matrix V which is needed, with T and Y, to apply the transformation to the unreduced part of the matrix, using an update of the form: A := (I - V*T*V') * (A - Y*T*V').

The contents of A on exit are illustrated by the following example with n = 7, k = 3 and nb = 2:

   ( a   a   a   a   a )
   ( a   a   a   a   a )
   ( a   a   a   a   a )
   ( h   h   a   a   a )
   ( v1  h   a   a   a )
   ( v1  v2  a   a   a )
   ( v1  v2  a   a   a )

where "a" denotes an element of the original matrix A, h denotes a modified element of the upper Hessenberg matrix H, and vi denotes an element of the vector defining H(i).

This implementation follows the hybrid algorithm and notations described in

S. Tomov and J. Dongarra, "Accelerating the reduction to upper Hessenberg form through hybrid GPU-based computing," University of Tennessee Computer Science Technical Report, UT-CS-09-642 (also LAPACK Working Note 219), May 24, 2009.

magma_int_t magma_dlahr2 ( magma_int_t  n,
magma_int_t  k,
magma_int_t  nb,
magmaDouble_ptr  dA,
magma_int_t  ldda,
magmaDouble_ptr  dV,
magma_int_t  lddv,
double *  A,
magma_int_t  lda,
double *  tau,
double *  T,
magma_int_t  ldt,
double *  Y,
magma_int_t  ldy,
magma_queue_t  queue 
)

DLAHR2 reduces the first NB columns of a real general n-BY-(n-k+1) matrix A so that elements below the k-th subdiagonal are zero.

The reduction is performed by an orthogonal similarity transformation Q' * A * Q. The routine returns the matrices V and T which determine Q as a block reflector I - V*T*V', and also the matrix Y = A * V. (Note this is different than LAPACK, which computes Y = A * V * T.)

This is an auxiliary routine called by DGEHRD.

Parameters
[in]nINTEGER The order of the matrix A.
[in]kINTEGER The offset for the reduction. Elements below the k-th subdiagonal in the first NB columns are reduced to zero. K < N.
[in]nbINTEGER The number of columns to be reduced.
[in,out]dADOUBLE PRECISION array on the GPU, dimension (LDDA,N-K+1) On entry, the n-by-(n-k+1) general matrix A. On exit, the elements in rows K:N of the first NB columns are overwritten with the matrix Y.
[in]lddaINTEGER The leading dimension of the array dA. LDDA >= max(1,N).
[out]dVDOUBLE PRECISION array on the GPU, dimension (LDDV, NB) On exit this n-by-nb array contains the Householder vectors of the transformation.
[in]lddvINTEGER The leading dimension of the array dV. LDDV >= max(1,N).
[in,out]ADOUBLE PRECISION array, dimension (LDA,N-K+1) On entry, the n-by-(n-k+1) general matrix A. On exit, the elements on and above the k-th subdiagonal in the first NB columns are overwritten with the corresponding elements of the reduced matrix; the elements below the k-th subdiagonal, with the array TAU, represent the matrix Q as a product of elementary reflectors. The other columns of A are unchanged. See Further Details.
[in]ldaINTEGER The leading dimension of the array A. LDA >= max(1,N).
[out]tauDOUBLE PRECISION array, dimension (NB) The scalar factors of the elementary reflectors. See Further Details.
[out]TDOUBLE PRECISION array, dimension (LDT,NB) The upper triangular matrix T.
[in]ldtINTEGER The leading dimension of the array T. LDT >= NB.
[out]YDOUBLE PRECISION array, dimension (LDY,NB) The n-by-nb matrix Y.
[in]ldyINTEGER The leading dimension of the array Y. LDY >= N.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The matrix Q is represented as a product of nb elementary reflectors

Q = H(1) H(2) . . . H(nb).

Each H(i) has the form

H(i) = I - tau * v * v'

where tau is a real scalar, and v is a real vector with v(1:i+k-1) = 0, v(i+k) = 1; v(i+k+1:n) is stored on exit in A(i+k+1:n,i), and tau in TAU(i).

The elements of the vectors v together form the (n-k+1)-by-nb matrix V which is needed, with T and Y, to apply the transformation to the unreduced part of the matrix, using an update of the form: A := (I - V*T*V') * (A - Y*T*V').

The contents of A on exit are illustrated by the following example with n = 7, k = 3 and nb = 2:

   ( a   a   a   a   a )
   ( a   a   a   a   a )
   ( a   a   a   a   a )
   ( h   h   a   a   a )
   ( v1  h   a   a   a )
   ( v1  v2  a   a   a )
   ( v1  v2  a   a   a )

where "a" denotes an element of the original matrix A, h denotes a modified element of the upper Hessenberg matrix H, and vi denotes an element of the vector defining H(i).

This implementation follows the hybrid algorithm and notations described in

S. Tomov and J. Dongarra, "Accelerating the reduction to upper Hessenberg form through hybrid GPU-based computing," University of Tennessee Computer Science Technical Report, UT-CS-09-642 (also LAPACK Working Note 219), May 24, 2009.

magma_int_t magma_dlahr2_m ( magma_int_t  n,
magma_int_t  k,
magma_int_t  nb,
double *  A,
magma_int_t  lda,
double *  tau,
double *  T,
magma_int_t  ldt,
double *  Y,
magma_int_t  ldy,
struct dgehrd_data data 
)

DLAHR2 reduces the first NB columns of a real general n-BY-(n-k+1) matrix A so that elements below the k-th subdiagonal are zero.

The reduction is performed by an orthogonal similarity transformation Q' * A * Q. The routine returns the matrices V and T which determine Q as a block reflector I - V*T*V', and also the matrix Y = A * V. (Note this is different than LAPACK, which computes Y = A * V * T.)

This is an auxiliary routine called by DGEHRD.

Parameters
[in]nINTEGER The order of the matrix A.
[in]kINTEGER The offset for the reduction. Elements below the k-th subdiagonal in the first NB columns are reduced to zero. K < N.
[in]nbINTEGER The number of columns to be reduced.
[in,out]ADOUBLE PRECISION array, dimension (LDA,N-K+1) On entry, the n-by-(n-k+1) general matrix A. On exit, the elements on and above the k-th subdiagonal in the first NB columns are overwritten with the corresponding elements of the reduced matrix; the elements below the k-th subdiagonal, with the array TAU, represent the matrix Q as a product of elementary reflectors. The other columns of A are unchanged. See Further Details.
[in]ldaINTEGER The leading dimension of the array A. LDA >= max(1,N).
[out]tauDOUBLE PRECISION array, dimension (NB) The scalar factors of the elementary reflectors. See Further Details.
[out]TDOUBLE PRECISION array, dimension (LDT,NB) The upper triangular matrix T.
[in]ldtINTEGER The leading dimension of the array T. LDT >= NB.
[out]YDOUBLE PRECISION array, dimension (LDY,NB) The n-by-nb matrix Y.
[in]ldyINTEGER The leading dimension of the array Y. LDY >= N.
[in,out]dataStructure with pointers to dA, dT, dV, dW, dY which are distributed across multiple GPUs.

Further Details

The matrix Q is represented as a product of nb elementary reflectors

Q = H(1) H(2) . . . H(nb).

Each H(i) has the form

H(i) = I - tau * v * v'

where tau is a real scalar, and v is a real vector with v(1:i+k-1) = 0, v(i+k) = 1; v(i+k+1:n) is stored on exit in A(i+k+1:n,i), and tau in TAU(i).

The elements of the vectors v together form the (n-k+1)-by-nb matrix V which is needed, with T and Y, to apply the transformation to the unreduced part of the matrix, using an update of the form: A := (I - V*T*V') * (A - Y*T*V').

The contents of A on exit are illustrated by the following example with n = 7, k = 3 and nb = 2:

   ( a   a   a   a   a )
   ( a   a   a   a   a )
   ( a   a   a   a   a )
   ( h   h   a   a   a )
   ( v1  h   a   a   a )
   ( v1  v2  a   a   a )
   ( v1  v2  a   a   a )

where "a" denotes an element of the original matrix A, h denotes a modified element of the upper Hessenberg matrix H, and vi denotes an element of the vector defining H(i).

This implementation follows the hybrid algorithm and notations described in

S. Tomov and J. Dongarra, "Accelerating the reduction to upper Hessenberg form through hybrid GPU-based computing," University of Tennessee Computer Science Technical Report, UT-CS-09-642 (also LAPACK Working Note 219), May 24, 2009.

magma_int_t magma_slahr2 ( magma_int_t  n,
magma_int_t  k,
magma_int_t  nb,
magmaFloat_ptr  dA,
magma_int_t  ldda,
magmaFloat_ptr  dV,
magma_int_t  lddv,
float *  A,
magma_int_t  lda,
float *  tau,
float *  T,
magma_int_t  ldt,
float *  Y,
magma_int_t  ldy,
magma_queue_t  queue 
)

SLAHR2 reduces the first NB columns of a real general n-BY-(n-k+1) matrix A so that elements below the k-th subdiagonal are zero.

The reduction is performed by an orthogonal similarity transformation Q' * A * Q. The routine returns the matrices V and T which determine Q as a block reflector I - V*T*V', and also the matrix Y = A * V. (Note this is different than LAPACK, which computes Y = A * V * T.)

This is an auxiliary routine called by SGEHRD.

Parameters
[in]nINTEGER The order of the matrix A.
[in]kINTEGER The offset for the reduction. Elements below the k-th subdiagonal in the first NB columns are reduced to zero. K < N.
[in]nbINTEGER The number of columns to be reduced.
[in,out]dAREAL array on the GPU, dimension (LDDA,N-K+1) On entry, the n-by-(n-k+1) general matrix A. On exit, the elements in rows K:N of the first NB columns are overwritten with the matrix Y.
[in]lddaINTEGER The leading dimension of the array dA. LDDA >= max(1,N).
[out]dVREAL array on the GPU, dimension (LDDV, NB) On exit this n-by-nb array contains the Householder vectors of the transformation.
[in]lddvINTEGER The leading dimension of the array dV. LDDV >= max(1,N).
[in,out]AREAL array, dimension (LDA,N-K+1) On entry, the n-by-(n-k+1) general matrix A. On exit, the elements on and above the k-th subdiagonal in the first NB columns are overwritten with the corresponding elements of the reduced matrix; the elements below the k-th subdiagonal, with the array TAU, represent the matrix Q as a product of elementary reflectors. The other columns of A are unchanged. See Further Details.
[in]ldaINTEGER The leading dimension of the array A. LDA >= max(1,N).
[out]tauREAL array, dimension (NB) The scalar factors of the elementary reflectors. See Further Details.
[out]TREAL array, dimension (LDT,NB) The upper triangular matrix T.
[in]ldtINTEGER The leading dimension of the array T. LDT >= NB.
[out]YREAL array, dimension (LDY,NB) The n-by-nb matrix Y.
[in]ldyINTEGER The leading dimension of the array Y. LDY >= N.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The matrix Q is represented as a product of nb elementary reflectors

Q = H(1) H(2) . . . H(nb).

Each H(i) has the form

H(i) = I - tau * v * v'

where tau is a real scalar, and v is a real vector with v(1:i+k-1) = 0, v(i+k) = 1; v(i+k+1:n) is stored on exit in A(i+k+1:n,i), and tau in TAU(i).

The elements of the vectors v together form the (n-k+1)-by-nb matrix V which is needed, with T and Y, to apply the transformation to the unreduced part of the matrix, using an update of the form: A := (I - V*T*V') * (A - Y*T*V').

The contents of A on exit are illustrated by the following example with n = 7, k = 3 and nb = 2:

   ( a   a   a   a   a )
   ( a   a   a   a   a )
   ( a   a   a   a   a )
   ( h   h   a   a   a )
   ( v1  h   a   a   a )
   ( v1  v2  a   a   a )
   ( v1  v2  a   a   a )

where "a" denotes an element of the original matrix A, h denotes a modified element of the upper Hessenberg matrix H, and vi denotes an element of the vector defining H(i).

This implementation follows the hybrid algorithm and notations described in

S. Tomov and J. Dongarra, "Accelerating the reduction to upper Hessenberg form through hybrid GPU-based computing," University of Tennessee Computer Science Technical Report, UT-CS-09-642 (also LAPACK Working Note 219), May 24, 2009.

magma_int_t magma_slahr2_m ( magma_int_t  n,
magma_int_t  k,
magma_int_t  nb,
float *  A,
magma_int_t  lda,
float *  tau,
float *  T,
magma_int_t  ldt,
float *  Y,
magma_int_t  ldy,
struct sgehrd_data data 
)

SLAHR2 reduces the first NB columns of a real general n-BY-(n-k+1) matrix A so that elements below the k-th subdiagonal are zero.

The reduction is performed by an orthogonal similarity transformation Q' * A * Q. The routine returns the matrices V and T which determine Q as a block reflector I - V*T*V', and also the matrix Y = A * V. (Note this is different than LAPACK, which computes Y = A * V * T.)

This is an auxiliary routine called by SGEHRD.

Parameters
[in]nINTEGER The order of the matrix A.
[in]kINTEGER The offset for the reduction. Elements below the k-th subdiagonal in the first NB columns are reduced to zero. K < N.
[in]nbINTEGER The number of columns to be reduced.
[in,out]AREAL array, dimension (LDA,N-K+1) On entry, the n-by-(n-k+1) general matrix A. On exit, the elements on and above the k-th subdiagonal in the first NB columns are overwritten with the corresponding elements of the reduced matrix; the elements below the k-th subdiagonal, with the array TAU, represent the matrix Q as a product of elementary reflectors. The other columns of A are unchanged. See Further Details.
[in]ldaINTEGER The leading dimension of the array A. LDA >= max(1,N).
[out]tauREAL array, dimension (NB) The scalar factors of the elementary reflectors. See Further Details.
[out]TREAL array, dimension (LDT,NB) The upper triangular matrix T.
[in]ldtINTEGER The leading dimension of the array T. LDT >= NB.
[out]YREAL array, dimension (LDY,NB) The n-by-nb matrix Y.
[in]ldyINTEGER The leading dimension of the array Y. LDY >= N.
[in,out]dataStructure with pointers to dA, dT, dV, dW, dY which are distributed across multiple GPUs.

Further Details

The matrix Q is represented as a product of nb elementary reflectors

Q = H(1) H(2) . . . H(nb).

Each H(i) has the form

H(i) = I - tau * v * v'

where tau is a real scalar, and v is a real vector with v(1:i+k-1) = 0, v(i+k) = 1; v(i+k+1:n) is stored on exit in A(i+k+1:n,i), and tau in TAU(i).

The elements of the vectors v together form the (n-k+1)-by-nb matrix V which is needed, with T and Y, to apply the transformation to the unreduced part of the matrix, using an update of the form: A := (I - V*T*V') * (A - Y*T*V').

The contents of A on exit are illustrated by the following example with n = 7, k = 3 and nb = 2:

   ( a   a   a   a   a )
   ( a   a   a   a   a )
   ( a   a   a   a   a )
   ( h   h   a   a   a )
   ( v1  h   a   a   a )
   ( v1  v2  a   a   a )
   ( v1  v2  a   a   a )

where "a" denotes an element of the original matrix A, h denotes a modified element of the upper Hessenberg matrix H, and vi denotes an element of the vector defining H(i).

This implementation follows the hybrid algorithm and notations described in

S. Tomov and J. Dongarra, "Accelerating the reduction to upper Hessenberg form through hybrid GPU-based computing," University of Tennessee Computer Science Technical Report, UT-CS-09-642 (also LAPACK Working Note 219), May 24, 2009.

magma_int_t magma_zlahr2 ( magma_int_t  n,
magma_int_t  k,
magma_int_t  nb,
magmaDoubleComplex_ptr  dA,
magma_int_t  ldda,
magmaDoubleComplex_ptr  dV,
magma_int_t  lddv,
magmaDoubleComplex *  A,
magma_int_t  lda,
magmaDoubleComplex *  tau,
magmaDoubleComplex *  T,
magma_int_t  ldt,
magmaDoubleComplex *  Y,
magma_int_t  ldy,
magma_queue_t  queue 
)

ZLAHR2 reduces the first NB columns of a complex general n-BY-(n-k+1) matrix A so that elements below the k-th subdiagonal are zero.

The reduction is performed by an orthogonal similarity transformation Q' * A * Q. The routine returns the matrices V and T which determine Q as a block reflector I - V*T*V', and also the matrix Y = A * V. (Note this is different than LAPACK, which computes Y = A * V * T.)

This is an auxiliary routine called by ZGEHRD.

Parameters
[in]nINTEGER The order of the matrix A.
[in]kINTEGER The offset for the reduction. Elements below the k-th subdiagonal in the first NB columns are reduced to zero. K < N.
[in]nbINTEGER The number of columns to be reduced.
[in,out]dACOMPLEX_16 array on the GPU, dimension (LDDA,N-K+1) On entry, the n-by-(n-k+1) general matrix A. On exit, the elements in rows K:N of the first NB columns are overwritten with the matrix Y.
[in]lddaINTEGER The leading dimension of the array dA. LDDA >= max(1,N).
[out]dVCOMPLEX_16 array on the GPU, dimension (LDDV, NB) On exit this n-by-nb array contains the Householder vectors of the transformation.
[in]lddvINTEGER The leading dimension of the array dV. LDDV >= max(1,N).
[in,out]ACOMPLEX_16 array, dimension (LDA,N-K+1) On entry, the n-by-(n-k+1) general matrix A. On exit, the elements on and above the k-th subdiagonal in the first NB columns are overwritten with the corresponding elements of the reduced matrix; the elements below the k-th subdiagonal, with the array TAU, represent the matrix Q as a product of elementary reflectors. The other columns of A are unchanged. See Further Details.
[in]ldaINTEGER The leading dimension of the array A. LDA >= max(1,N).
[out]tauCOMPLEX_16 array, dimension (NB) The scalar factors of the elementary reflectors. See Further Details.
[out]TCOMPLEX_16 array, dimension (LDT,NB) The upper triangular matrix T.
[in]ldtINTEGER The leading dimension of the array T. LDT >= NB.
[out]YCOMPLEX_16 array, dimension (LDY,NB) The n-by-nb matrix Y.
[in]ldyINTEGER The leading dimension of the array Y. LDY >= N.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The matrix Q is represented as a product of nb elementary reflectors

Q = H(1) H(2) . . . H(nb).

Each H(i) has the form

H(i) = I - tau * v * v'

where tau is a complex scalar, and v is a complex vector with v(1:i+k-1) = 0, v(i+k) = 1; v(i+k+1:n) is stored on exit in A(i+k+1:n,i), and tau in TAU(i).

The elements of the vectors v together form the (n-k+1)-by-nb matrix V which is needed, with T and Y, to apply the transformation to the unreduced part of the matrix, using an update of the form: A := (I - V*T*V') * (A - Y*T*V').

The contents of A on exit are illustrated by the following example with n = 7, k = 3 and nb = 2:

   ( a   a   a   a   a )
   ( a   a   a   a   a )
   ( a   a   a   a   a )
   ( h   h   a   a   a )
   ( v1  h   a   a   a )
   ( v1  v2  a   a   a )
   ( v1  v2  a   a   a )

where "a" denotes an element of the original matrix A, h denotes a modified element of the upper Hessenberg matrix H, and vi denotes an element of the vector defining H(i).

This implementation follows the hybrid algorithm and notations described in

S. Tomov and J. Dongarra, "Accelerating the reduction to upper Hessenberg form through hybrid GPU-based computing," University of Tennessee Computer Science Technical Report, UT-CS-09-642 (also LAPACK Working Note 219), May 24, 2009.

magma_int_t magma_zlahr2_m ( magma_int_t  n,
magma_int_t  k,
magma_int_t  nb,
magmaDoubleComplex *  A,
magma_int_t  lda,
magmaDoubleComplex *  tau,
magmaDoubleComplex *  T,
magma_int_t  ldt,
magmaDoubleComplex *  Y,
magma_int_t  ldy,
struct zgehrd_data data 
)

ZLAHR2 reduces the first NB columns of a complex general n-BY-(n-k+1) matrix A so that elements below the k-th subdiagonal are zero.

The reduction is performed by an orthogonal similarity transformation Q' * A * Q. The routine returns the matrices V and T which determine Q as a block reflector I - V*T*V', and also the matrix Y = A * V. (Note this is different than LAPACK, which computes Y = A * V * T.)

This is an auxiliary routine called by ZGEHRD.

Parameters
[in]nINTEGER The order of the matrix A.
[in]kINTEGER The offset for the reduction. Elements below the k-th subdiagonal in the first NB columns are reduced to zero. K < N.
[in]nbINTEGER The number of columns to be reduced.
[in,out]ACOMPLEX_16 array, dimension (LDA,N-K+1) On entry, the n-by-(n-k+1) general matrix A. On exit, the elements on and above the k-th subdiagonal in the first NB columns are overwritten with the corresponding elements of the reduced matrix; the elements below the k-th subdiagonal, with the array TAU, represent the matrix Q as a product of elementary reflectors. The other columns of A are unchanged. See Further Details.
[in]ldaINTEGER The leading dimension of the array A. LDA >= max(1,N).
[out]tauCOMPLEX_16 array, dimension (NB) The scalar factors of the elementary reflectors. See Further Details.
[out]TCOMPLEX_16 array, dimension (LDT,NB) The upper triangular matrix T.
[in]ldtINTEGER The leading dimension of the array T. LDT >= NB.
[out]YCOMPLEX_16 array, dimension (LDY,NB) The n-by-nb matrix Y.
[in]ldyINTEGER The leading dimension of the array Y. LDY >= N.
[in,out]dataStructure with pointers to dA, dT, dV, dW, dY which are distributed across multiple GPUs.

Further Details

The matrix Q is represented as a product of nb elementary reflectors

Q = H(1) H(2) . . . H(nb).

Each H(i) has the form

H(i) = I - tau * v * v'

where tau is a complex scalar, and v is a complex vector with v(1:i+k-1) = 0, v(i+k) = 1; v(i+k+1:n) is stored on exit in A(i+k+1:n,i), and tau in TAU(i).

The elements of the vectors v together form the (n-k+1)-by-nb matrix V which is needed, with T and Y, to apply the transformation to the unreduced part of the matrix, using an update of the form: A := (I - V*T*V') * (A - Y*T*V').

The contents of A on exit are illustrated by the following example with n = 7, k = 3 and nb = 2:

   ( a   a   a   a   a )
   ( a   a   a   a   a )
   ( a   a   a   a   a )
   ( h   h   a   a   a )
   ( v1  h   a   a   a )
   ( v1  v2  a   a   a )
   ( v1  v2  a   a   a )

where "a" denotes an element of the original matrix A, h denotes a modified element of the upper Hessenberg matrix H, and vi denotes an element of the vector defining H(i).

This implementation follows the hybrid algorithm and notations described in

S. Tomov and J. Dongarra, "Accelerating the reduction to upper Hessenberg form through hybrid GPU-based computing," University of Tennessee Computer Science Technical Report, UT-CS-09-642 (also LAPACK Working Note 219), May 24, 2009.