MAGMA
2.3.0
Matrix Algebra for GPU and Multicore Architectures

Functions  
magma_int_t  magma_chetrd_he2hb (magma_uplo_t uplo, magma_int_t n, magma_int_t nb, magmaFloatComplex *A, magma_int_t lda, magmaFloatComplex *tau, magmaFloatComplex *work, magma_int_t lwork, magmaFloatComplex_ptr dT, magma_int_t *info) 
CHETRD_HE2HB reduces a complex Hermitian matrix A to real symmetric banddiagonal form T by an orthogonal similarity transformation: Q**H * A * Q = T. More...  
magma_int_t  magma_chetrd_he2hb_mgpu (magma_uplo_t uplo, magma_int_t n, magma_int_t nb, magmaFloatComplex *A, magma_int_t lda, magmaFloatComplex *tau, magmaFloatComplex *work, magma_int_t lwork, magmaFloatComplex_ptr dAmgpu[], magma_int_t ldda, magmaFloatComplex_ptr dTmgpu[], magma_int_t lddt, magma_int_t ngpu, magma_int_t distblk, magma_queue_t queues[][20], magma_int_t nqueue, magma_int_t *info) 
CHETRD_HE2HB reduces a complex Hermitian matrix A to real symmetric banddiagonal form T by an orthogonal similarity transformation: Q**H * A * Q = T. More...  
magma_int_t  magma_zhetrd_he2hb (magma_uplo_t uplo, magma_int_t n, magma_int_t nb, magmaDoubleComplex *A, magma_int_t lda, magmaDoubleComplex *tau, magmaDoubleComplex *work, magma_int_t lwork, magmaDoubleComplex_ptr dT, magma_int_t *info) 
ZHETRD_HE2HB reduces a complex Hermitian matrix A to real symmetric banddiagonal form T by an orthogonal similarity transformation: Q**H * A * Q = T. More...  
magma_int_t  magma_zhetrd_he2hb_mgpu (magma_uplo_t uplo, magma_int_t n, magma_int_t nb, magmaDoubleComplex *A, magma_int_t lda, magmaDoubleComplex *tau, magmaDoubleComplex *work, magma_int_t lwork, magmaDoubleComplex_ptr dAmgpu[], magma_int_t ldda, magmaDoubleComplex_ptr dTmgpu[], magma_int_t lddt, magma_int_t ngpu, magma_int_t distblk, magma_queue_t queues[][20], magma_int_t nqueue, magma_int_t *info) 
ZHETRD_HE2HB reduces a complex Hermitian matrix A to real symmetric banddiagonal form T by an orthogonal similarity transformation: Q**H * A * Q = T. More...  
magma_int_t magma_chetrd_he2hb  (  magma_uplo_t  uplo, 
magma_int_t  n,  
magma_int_t  nb,  
magmaFloatComplex *  A,  
magma_int_t  lda,  
magmaFloatComplex *  tau,  
magmaFloatComplex *  work,  
magma_int_t  lwork,  
magmaFloatComplex_ptr  dT,  
magma_int_t *  info  
) 
CHETRD_HE2HB reduces a complex Hermitian matrix A to real symmetric banddiagonal form T by an orthogonal similarity transformation: Q**H * A * Q = T.
This version stores the triangular matrices T used in the accumulated Householder transformations (I  V T V').
[in]  uplo  magma_uplo_t

[in]  n  INTEGER The order of the matrix A. n >= 0. 
[in]  nb  INTEGER The inner blocking. nb >= 0. 
[in,out]  A  COMPLEX array, dimension (LDA,N) On entry, the Hermitian matrix A. If UPLO = MagmaUpper, the leading NbyN upper triangular part of A contains the upper triangular part of the matrix A, and the strictly lower triangular part of A is not referenced. If UPLO = MagmaLower, the leading NbyN lower triangular part of A contains the lower triangular part of the matrix A, and the strictly upper triangular part of A is not referenced. On exit, if UPLO = MagmaUpper, the Upper banddiagonal of A is overwritten by the corresponding elements of the banddiagonal matrix T, and the elements above the band diagonal, with the array TAU, represent the orthogonal matrix Q as a product of elementary reflectors; if UPLO = MagmaLower, the the Lower banddiagonal of A is overwritten by the corresponding elements of the banddiagonal matrix T, and the elements below the banddiagonal, with the array TAU, represent the orthogonal matrix Q as a product of elementary reflectors. See Further Details. 
[in]  lda  INTEGER The leading dimension of the array A. LDA >= max(1,N). 
[out]  tau  COMPLEX array, dimension (N1) The scalar factors of the elementary reflectors (see Further Details). 
[out]  work  (workspace) COMPLEX array, dimension (MAX(1,LWORK)) On exit, if INFO = 0, WORK[0] returns the optimal LWORK. 
[in]  lwork  INTEGER The dimension of the array WORK. LWORK >= 1. For optimum performance LWORK >= N*NB, where NB is the optimal blocksize. If LWORK = 1, then a workspace query is assumed; the routine only calculates the optimal size of the WORK array, returns this value as the first entry of the WORK array, and no error message related to LWORK is issued by XERBLA. 
[out]  dT  COMPLEX array on the GPU, dimension N*NB, where NB is the optimal blocksize. On exit dT holds the upper triangular matrices T from the accumulated Householder transformations (I  V T V') used in the factorization. The nb x nb matrices T are ordered consecutively in memory one after another. 
[out]  info  INTEGER

If UPLO = MagmaUpper, the matrix Q is represented as a product of elementary reflectors
Q = H(n1) . . . H(2) H(1).
Each H(i) has the form
H(i) = I  tau * v * v'
where tau is a complex scalar, and v is a complex vector with v(i+1:n) = 0 and v(i) = 1; v(1:i1) is stored on exit in A(1:i1,i+1), and tau in TAU(i).
If UPLO = MagmaLower, the matrix Q is represented as a product of elementary reflectors
Q = H(1) H(2) . . . H(n1).
Each H(i) has the form
H(i) = I  tau * v * v'
where tau is a complex scalar, and v is a complex vector with v(1:i) = 0 and v(i+1) = 1; v(i+2:n) is stored on exit in A(i+2:n,i), and tau in TAU(i).
The contents of A on exit are illustrated by the following examples with n = 5:
if UPLO = MagmaUpper: if UPLO = MagmaLower:
( d e v2 v3 v4 ) ( d ) ( d e v3 v4 ) ( e d ) ( d e v4 ) ( v1 e d ) ( d e ) ( v1 v2 e d ) ( d ) ( v1 v2 v3 e d )
where d and e denote diagonal and offdiagonal elements of T, and vi denotes an element of the vector defining H(i).
magma_int_t magma_chetrd_he2hb_mgpu  (  magma_uplo_t  uplo, 
magma_int_t  n,  
magma_int_t  nb,  
magmaFloatComplex *  A,  
magma_int_t  lda,  
magmaFloatComplex *  tau,  
magmaFloatComplex *  work,  
magma_int_t  lwork,  
magmaFloatComplex_ptr  dAmgpu[],  
magma_int_t  ldda,  
magmaFloatComplex_ptr  dTmgpu[],  
magma_int_t  lddt,  
magma_int_t  ngpu,  
magma_int_t  distblk,  
magma_queue_t  queues[][20],  
magma_int_t  nqueue,  
magma_int_t *  info  
) 
CHETRD_HE2HB reduces a complex Hermitian matrix A to real symmetric banddiagonal form T by an orthogonal similarity transformation: Q**H * A * Q = T.
This version stores the triangular matrices T used in the accumulated Householder transformations (I  V T V').
[in]  uplo  magma_uplo_t

[in]  n  INTEGER The order of the matrix A. N >= 0. 
[in]  nb  INTEGER The inner blocking. nb >= 0. 
[in,out]  A  COMPLEX array, dimension (LDA,N) On entry, the Hermitian matrix A. If UPLO = MagmaUpper, the leading NbyN upper triangular part of A contains the upper triangular part of the matrix A, and the strictly lower triangular part of A is not referenced. If UPLO = MagmaLower, the leading NbyN lower triangular part of A contains the lower triangular part of the matrix A, and the strictly upper triangular part of A is not referenced. On exit, if UPLO = MagmaUpper, the Upper banddiagonal of A is overwritten by the corresponding elements of the banddiagonal matrix T, and the elements above the band diagonal, with the array TAU, represent the orthogonal matrix Q as a product of elementary reflectors; if UPLO = MagmaLower, the the Lower banddiagonal of A is overwritten by the corresponding elements of the banddiagonal matrix T, and the elements below the banddiagonal, with the array TAU, represent the orthogonal matrix Q as a product of elementary reflectors. See Further Details. 
[in]  lda  INTEGER The leading dimension of the array A. LDA >= max(1,N). 
[out]  tau  COMPLEX array, dimension (N1) The scalar factors of the elementary reflectors (see Further Details). 
[out]  work  (workspace) COMPLEX array, dimension (MAX(1,LWORK)) On exit, if INFO = 0, WORK[0] returns the optimal LWORK. 
[in]  lwork  INTEGER The dimension of the array WORK. LWORK >= 1. For optimum performance LWORK >= N*NB, where NB is the optimal blocksize. If LWORK = 1, then a workspace query is assumed; the routine only calculates the optimal size of the WORK array, returns this value as the first entry of the WORK array, and no error message related to LWORK is issued by XERBLA. 
[in,out]  dAmgpu  COMPLEX array of pointer, dimension (ngpu) Each point to a COMPLEX array, dimension (LDDA, nlocal) which hold the local matrix on each GPU. 
[in]  ldda  INTEGER The leading dimension of the array dAmgpu. ldda >= max(1,n). 
[in,out]  dTmgpu  COMPLEX array of pointer, dimension (ngpu) Each point to a COMPLEX array on the GPU, dimension n*nb, where nb is the optimal blocksize. On exit dT holds the upper triangular matrices T from the accumulated Householder transformations (I  V T V') used in the factorization. The nb x nb matrices T are ordered consecutively in memory one after another. 
[in]  lddt  INTEGER The leading dimension of each array dT. lddt >= max(1,nb). 
[in]  ngpu  INTEGER The number of GPUs. 
[in]  distblk  INTEGER Internal parameter for performance tuning. The size of the distribution/computation. 
[in]  queues  Array of magma_queue_t that point to the queues to be used in execution/communications. Dimension >= max(3, ngpu+1) Queue to execute in. 
[in]  nqueue  INTEGER The number of queues should be >= max(3, ngpu+1). 
[out]  info  INTEGER

If UPLO = MagmaUpper, the matrix Q is represented as a product of elementary reflectors
Q = H(n1) . . . H(2) H(1).
Each H(i) has the form
H(i) = I  tau * v * v'
where tau is a complex scalar, and v is a complex vector with v(i+1:n) = 0 and v(i) = 1; v(1:i1) is stored on exit in A(1:i1,i+1), and tau in TAU(i).
If UPLO = MagmaLower, the matrix Q is represented as a product of elementary reflectors
Q = H(1) H(2) . . . H(n1).
Each H(i) has the form
H(i) = I  tau * v * v'
where tau is a complex scalar, and v is a complex vector with v(1:i) = 0 and v(i+1) = 1; v(i+2:n) is stored on exit in A(i+2:n,i), and tau in TAU(i).
The contents of A on exit are illustrated by the following examples with n = 5:
if UPLO = MagmaUpper: if UPLO = MagmaLower:
( d e v2 v3 v4 ) ( d ) ( d e v3 v4 ) ( e d ) ( d e v4 ) ( v1 e d ) ( d e ) ( v1 v2 e d ) ( d ) ( v1 v2 v3 e d )
where d and e denote diagonal and offdiagonal elements of T, and vi denotes an element of the vector defining H(i).
magma_int_t magma_zhetrd_he2hb  (  magma_uplo_t  uplo, 
magma_int_t  n,  
magma_int_t  nb,  
magmaDoubleComplex *  A,  
magma_int_t  lda,  
magmaDoubleComplex *  tau,  
magmaDoubleComplex *  work,  
magma_int_t  lwork,  
magmaDoubleComplex_ptr  dT,  
magma_int_t *  info  
) 
ZHETRD_HE2HB reduces a complex Hermitian matrix A to real symmetric banddiagonal form T by an orthogonal similarity transformation: Q**H * A * Q = T.
This version stores the triangular matrices T used in the accumulated Householder transformations (I  V T V').
[in]  uplo  magma_uplo_t

[in]  n  INTEGER The order of the matrix A. n >= 0. 
[in]  nb  INTEGER The inner blocking. nb >= 0. 
[in,out]  A  COMPLEX_16 array, dimension (LDA,N) On entry, the Hermitian matrix A. If UPLO = MagmaUpper, the leading NbyN upper triangular part of A contains the upper triangular part of the matrix A, and the strictly lower triangular part of A is not referenced. If UPLO = MagmaLower, the leading NbyN lower triangular part of A contains the lower triangular part of the matrix A, and the strictly upper triangular part of A is not referenced. On exit, if UPLO = MagmaUpper, the Upper banddiagonal of A is overwritten by the corresponding elements of the banddiagonal matrix T, and the elements above the band diagonal, with the array TAU, represent the orthogonal matrix Q as a product of elementary reflectors; if UPLO = MagmaLower, the the Lower banddiagonal of A is overwritten by the corresponding elements of the banddiagonal matrix T, and the elements below the banddiagonal, with the array TAU, represent the orthogonal matrix Q as a product of elementary reflectors. See Further Details. 
[in]  lda  INTEGER The leading dimension of the array A. LDA >= max(1,N). 
[out]  tau  COMPLEX_16 array, dimension (N1) The scalar factors of the elementary reflectors (see Further Details). 
[out]  work  (workspace) COMPLEX_16 array, dimension (MAX(1,LWORK)) On exit, if INFO = 0, WORK[0] returns the optimal LWORK. 
[in]  lwork  INTEGER The dimension of the array WORK. LWORK >= 1. For optimum performance LWORK >= N*NB, where NB is the optimal blocksize. If LWORK = 1, then a workspace query is assumed; the routine only calculates the optimal size of the WORK array, returns this value as the first entry of the WORK array, and no error message related to LWORK is issued by XERBLA. 
[out]  dT  COMPLEX_16 array on the GPU, dimension N*NB, where NB is the optimal blocksize. On exit dT holds the upper triangular matrices T from the accumulated Householder transformations (I  V T V') used in the factorization. The nb x nb matrices T are ordered consecutively in memory one after another. 
[out]  info  INTEGER

If UPLO = MagmaUpper, the matrix Q is represented as a product of elementary reflectors
Q = H(n1) . . . H(2) H(1).
Each H(i) has the form
H(i) = I  tau * v * v'
where tau is a complex scalar, and v is a complex vector with v(i+1:n) = 0 and v(i) = 1; v(1:i1) is stored on exit in A(1:i1,i+1), and tau in TAU(i).
If UPLO = MagmaLower, the matrix Q is represented as a product of elementary reflectors
Q = H(1) H(2) . . . H(n1).
Each H(i) has the form
H(i) = I  tau * v * v'
where tau is a complex scalar, and v is a complex vector with v(1:i) = 0 and v(i+1) = 1; v(i+2:n) is stored on exit in A(i+2:n,i), and tau in TAU(i).
The contents of A on exit are illustrated by the following examples with n = 5:
if UPLO = MagmaUpper: if UPLO = MagmaLower:
( d e v2 v3 v4 ) ( d ) ( d e v3 v4 ) ( e d ) ( d e v4 ) ( v1 e d ) ( d e ) ( v1 v2 e d ) ( d ) ( v1 v2 v3 e d )
where d and e denote diagonal and offdiagonal elements of T, and vi denotes an element of the vector defining H(i).
magma_int_t magma_zhetrd_he2hb_mgpu  (  magma_uplo_t  uplo, 
magma_int_t  n,  
magma_int_t  nb,  
magmaDoubleComplex *  A,  
magma_int_t  lda,  
magmaDoubleComplex *  tau,  
magmaDoubleComplex *  work,  
magma_int_t  lwork,  
magmaDoubleComplex_ptr  dAmgpu[],  
magma_int_t  ldda,  
magmaDoubleComplex_ptr  dTmgpu[],  
magma_int_t  lddt,  
magma_int_t  ngpu,  
magma_int_t  distblk,  
magma_queue_t  queues[][20],  
magma_int_t  nqueue,  
magma_int_t *  info  
) 
ZHETRD_HE2HB reduces a complex Hermitian matrix A to real symmetric banddiagonal form T by an orthogonal similarity transformation: Q**H * A * Q = T.
This version stores the triangular matrices T used in the accumulated Householder transformations (I  V T V').
[in]  uplo  magma_uplo_t

[in]  n  INTEGER The order of the matrix A. N >= 0. 
[in]  nb  INTEGER The inner blocking. nb >= 0. 
[in,out]  A  COMPLEX_16 array, dimension (LDA,N) On entry, the Hermitian matrix A. If UPLO = MagmaUpper, the leading NbyN upper triangular part of A contains the upper triangular part of the matrix A, and the strictly lower triangular part of A is not referenced. If UPLO = MagmaLower, the leading NbyN lower triangular part of A contains the lower triangular part of the matrix A, and the strictly upper triangular part of A is not referenced. On exit, if UPLO = MagmaUpper, the Upper banddiagonal of A is overwritten by the corresponding elements of the banddiagonal matrix T, and the elements above the band diagonal, with the array TAU, represent the orthogonal matrix Q as a product of elementary reflectors; if UPLO = MagmaLower, the the Lower banddiagonal of A is overwritten by the corresponding elements of the banddiagonal matrix T, and the elements below the banddiagonal, with the array TAU, represent the orthogonal matrix Q as a product of elementary reflectors. See Further Details. 
[in]  lda  INTEGER The leading dimension of the array A. LDA >= max(1,N). 
[out]  tau  COMPLEX_16 array, dimension (N1) The scalar factors of the elementary reflectors (see Further Details). 
[out]  work  (workspace) COMPLEX_16 array, dimension (MAX(1,LWORK)) On exit, if INFO = 0, WORK[0] returns the optimal LWORK. 
[in]  lwork  INTEGER The dimension of the array WORK. LWORK >= 1. For optimum performance LWORK >= N*NB, where NB is the optimal blocksize. If LWORK = 1, then a workspace query is assumed; the routine only calculates the optimal size of the WORK array, returns this value as the first entry of the WORK array, and no error message related to LWORK is issued by XERBLA. 
[in,out]  dAmgpu  COMPLEX_16 array of pointer, dimension (ngpu) Each point to a COMPLEX_16 array, dimension (LDDA, nlocal) which hold the local matrix on each GPU. 
[in]  ldda  INTEGER The leading dimension of the array dAmgpu. ldda >= max(1,n). 
[in,out]  dTmgpu  COMPLEX_16 array of pointer, dimension (ngpu) Each point to a COMPLEX_16 array on the GPU, dimension n*nb, where nb is the optimal blocksize. On exit dT holds the upper triangular matrices T from the accumulated Householder transformations (I  V T V') used in the factorization. The nb x nb matrices T are ordered consecutively in memory one after another. 
[in]  lddt  INTEGER The leading dimension of each array dT. lddt >= max(1,nb). 
[in]  ngpu  INTEGER The number of GPUs. 
[in]  distblk  INTEGER Internal parameter for performance tuning. The size of the distribution/computation. 
[in]  queues  Array of magma_queue_t that point to the queues to be used in execution/communications. Dimension >= max(3, ngpu+1) Queue to execute in. 
[in]  nqueue  INTEGER The number of queues should be >= max(3, ngpu+1). 
[out]  info  INTEGER

If UPLO = MagmaUpper, the matrix Q is represented as a product of elementary reflectors
Q = H(n1) . . . H(2) H(1).
Each H(i) has the form
H(i) = I  tau * v * v'
where tau is a complex scalar, and v is a complex vector with v(i+1:n) = 0 and v(i) = 1; v(1:i1) is stored on exit in A(1:i1,i+1), and tau in TAU(i).
If UPLO = MagmaLower, the matrix Q is represented as a product of elementary reflectors
Q = H(1) H(2) . . . H(n1).
Each H(i) has the form
H(i) = I  tau * v * v'
where tau is a complex scalar, and v is a complex vector with v(1:i) = 0 and v(i+1) = 1; v(i+2:n) is stored on exit in A(i+2:n,i), and tau in TAU(i).
The contents of A on exit are illustrated by the following examples with n = 5:
if UPLO = MagmaUpper: if UPLO = MagmaLower:
( d e v2 v3 v4 ) ( d ) ( d e v3 v4 ) ( e d ) ( d e v4 ) ( v1 e d ) ( d e ) ( v1 v2 e d ) ( d ) ( v1 v2 v3 e d )
where d and e denote diagonal and offdiagonal elements of T, and vi denotes an element of the vector defining H(i).