PLASMA  2.4.5
PLASMA - Parallel Linear Algebra for Scalable Multi-core Architectures
 All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Macros Groups
core_zherfb.c File Reference
#include <lapacke.h>
#include "common.h"
Include dependency graph for core_zherfb.c:

Go to the source code of this file.

Macros

#define COMPLEX

Functions

int CORE_zherfb (PLASMA_enum uplo, int n, int k, int ib, int nb, PLASMA_Complex64_t *A, int lda, PLASMA_Complex64_t *T, int ldt, PLASMA_Complex64_t *C, int ldc, PLASMA_Complex64_t *WORK, int ldwork)
void QUARK_CORE_zherfb (Quark *quark, Quark_Task_Flags *task_flags, PLASMA_enum uplo, int n, int k, int ib, int nb, PLASMA_Complex64_t *A, int lda, PLASMA_Complex64_t *T, int ldt, PLASMA_Complex64_t *C, int ldc)
void CORE_zherfb_quark (Quark *quark)

Detailed Description

PLASMA core_blas kernel PLASMA is a software package provided by Univ. of Tennessee, Univ. of California Berkeley and Univ. of Colorado Denver

Version:
2.4.5
Author:
Hatem Ltaief
Date:
2010-11-15 normal z -> c d s

Definition in file core_zherfb.c.


Macro Definition Documentation

#define COMPLEX

Definition at line 18 of file core_zherfb.c.


Function Documentation

int CORE_zherfb ( PLASMA_enum  uplo,
int  n,
int  k,
int  ib,
int  nb,
PLASMA_Complex64_t A,
int  lda,
PLASMA_Complex64_t T,
int  ldt,
PLASMA_Complex64_t C,
int  ldc,
PLASMA_Complex64_t WORK,
int  ldwork 
)

CORE_zherfb overwrites the symmetric complex N-by-N tile C with

Q**T*C*Q

where Q is a complex unitary matrix defined as the product of k elementary reflectors

Q = H(1) H(2) . . . H(k)

as returned by CORE_zgeqrt. Only PlasmaLower supported!

Parameters:
[in]uplo
  • PlasmaLower : the upper part of the symmetric matrix C is not referenced.
  • PlasmaUpper : the lower part of the symmetric matrix C is not referenced (not supported).
[in]nThe number of rows/columns of the tile C. N >= 0.
[in]kThe number of elementary reflectors whose product defines the matrix Q. K >= 0.
[in]ibThe inner-blocking size. IB >= 0.
[in]nbThe blocking size. NB >= 0.
[in]AThe i-th column must contain the vector which defines the elementary reflector H(i), for i = 1,2,...,k, as returned by CORE_zgeqrt in the first k columns of its array argument A.
[in]ldaThe leading dimension of the array A. LDA >= max(1,N).
[out]TThe IB-by-K triangular factor T of the block reflector. T is upper triangular by block (economic storage); The rest of the array is not referenced.
[in]ldtThe leading dimension of the array T. LDT >= IB.
[in,out]COn entry, the symmetric N-by-N tile C. On exit, C is overwritten by Q**T*C*Q.
[in]ldcThe leading dimension of the array C. LDC >= max(1,M).
[in,out]WORKOn exit, if INFO = 0, WORK(1) returns the optimal LDWORK.
[in]ldworkThe dimension of the array WORK. LDWORK >= max(1,N);
Returns:
Return values:
PLASMA_SUCCESSsuccessful exit
<0if -i, the i-th argument had an illegal value

Definition at line 110 of file core_zherfb.c.

References CORE_zunmlq(), CORE_zunmqr(), PlasmaConjTrans, PlasmaLeft, PlasmaLower, PlasmaNoTrans, and PlasmaRight.

{
int i, j;
if (uplo == PlasmaLower) {
/* Rebuild the symmetric block: WORK <- C */
for (j = 0; j < n; j++)
for (i = j; i < n; i++){
*(WORK + i + j * ldwork) = *(C + i + j*ldc);
if (i > j){
*(WORK + j + i * ldwork) = *(WORK + i + j * ldwork);
#ifdef COMPLEX
LAPACKE_zlacgv_work(1, WORK + j + i * ldwork, ldwork);
#endif
}
}
/* Left */
A, lda, T, ldt, WORK, ldwork, WORK+nb*ldwork, ldwork);
/* Right */
A, lda, T, ldt, WORK, ldwork, WORK+nb*ldwork, ldwork);
/*
* Copy back the final result to the lower part of C
*/
/* C = WORK */
for (j = 0; j < n; j++)
for (i = j; i < n; i++)
*(C + i + j*ldc) = *(WORK + i + j * ldwork);
}
else {
/* Rebuild the symmetric block: WORK <- C */
for (i = 0; i < n; i++)
for (j = i; j < n; j++){
*(WORK + i + j * ldwork) = *(C + i + j*ldc);
if (j > i){
*(WORK + j + i * ldwork) = *(WORK + i + j * ldwork);
#ifdef COMPLEX
LAPACKE_zlacgv_work(1, WORK + j + i * ldwork, ldwork);
#endif
}
}
/* Right */
A, lda, T, ldt, WORK, ldwork, WORK+nb*ldwork, ldwork);
/* Left */
A, lda, T, ldt, WORK, ldwork, WORK+nb*ldwork, ldwork);
/*
* Copy back the final result to the upper part of C
*/
/* C = WORK */
for (i = 0; i < n; i++)
for (j = i; j < n; j++)
*(C + i + j*ldc) = *(WORK + i + j * ldwork);
}
return 0;
}

Here is the call graph for this function:

Here is the caller graph for this function:

void CORE_zherfb_quark ( Quark quark)

Definition at line 215 of file core_zherfb.c.

References A, C, CORE_zherfb(), quark_unpack_args_13, T, and uplo.

{
int n;
int k;
int ib;
int nb;
int lda;
int ldt;
int ldc;
int ldwork;
quark_unpack_args_13(quark, uplo, n, k, ib, nb, A, lda, T, ldt, C, ldc, WORK, ldwork);
CORE_zherfb(uplo, n, k, ib, nb, A, lda, T, ldt, C, ldc, WORK, ldwork);
}

Here is the call graph for this function:

Here is the caller graph for this function:

void QUARK_CORE_zherfb ( Quark quark,
Quark_Task_Flags task_flags,
PLASMA_enum  uplo,
int  n,
int  k,
int  ib,
int  nb,
PLASMA_Complex64_t A,
int  lda,
PLASMA_Complex64_t T,
int  ldt,
PLASMA_Complex64_t C,
int  ldc 
)

This kernel is just a workaround for now... will be deleted eventually and replaced by the one above (Piotr's Task)

Definition at line 183 of file core_zherfb.c.

References CORE_zherfb_quark(), INOUT, INPUT, PlasmaUpper, QUARK_Insert_Task(), QUARK_REGION_D, QUARK_REGION_L, QUARK_REGION_U, SCRATCH, and VALUE.

{
quark, CORE_zherfb_quark, task_flags,
sizeof(PLASMA_enum), &uplo, VALUE,
sizeof(int), &n, VALUE,
sizeof(int), &k, VALUE,
sizeof(int), &ib, VALUE,
sizeof(int), &nb, VALUE,
sizeof(int), &lda, VALUE,
sizeof(PLASMA_Complex64_t)*ib*nb, T, INPUT,
sizeof(int), &ldt, VALUE,
sizeof(int), &ldc, VALUE,
sizeof(PLASMA_Complex64_t)*2*nb*nb, NULL, SCRATCH,
sizeof(int), &nb, VALUE,
0);
}

Here is the call graph for this function:

Here is the caller graph for this function: