MAGMA
2.3.0
Matrix Algebra for GPU and Multicore Architectures

Functions  
magma_int_t  magma_cpotrf (magma_uplo_t uplo, magma_int_t n, magmaFloatComplex *A, magma_int_t lda, magma_int_t *info) 
CPOTRF computes the Cholesky factorization of a complex Hermitian positive definite matrix A. More...  
magma_int_t  magma_cpotrf3_mgpu (magma_int_t ngpu, magma_uplo_t uplo, magma_int_t m, magma_int_t n, magma_int_t off_i, magma_int_t off_j, magma_int_t nb, magmaFloatComplex_ptr d_lA[], magma_int_t ldda, magmaFloatComplex_ptr d_lP[], magma_int_t lddp, magmaFloatComplex *A, magma_int_t lda, magma_int_t h, magma_queue_t queues[][3], magma_event_t events[][5], magma_int_t *info) 
CPOTRF computes the Cholesky factorization of a complex Hermitian positive definite matrix dA. More...  
magma_int_t  magma_cpotrf_gpu (magma_uplo_t uplo, magma_int_t n, magmaFloatComplex_ptr dA, magma_int_t ldda, magma_int_t *info) 
CPOTRF computes the Cholesky factorization of a complex Hermitian positive definite matrix dA. More...  
magma_int_t  magma_cpotrf_m (magma_int_t ngpu, magma_uplo_t uplo, magma_int_t n, magmaFloatComplex *A, magma_int_t lda, magma_int_t *info) 
CPOTRF computes the Cholesky factorization of a complex Hermitian positive definite matrix A. More...  
magma_int_t  magma_cpotrf_mgpu (magma_int_t ngpu, magma_uplo_t uplo, magma_int_t n, magmaFloatComplex_ptr d_lA[], magma_int_t ldda, magma_int_t *info) 
CPOTRF computes the Cholesky factorization of a complex Hermitian positive definite matrix dA. More...  
magma_int_t  magma_dpotrf (magma_uplo_t uplo, magma_int_t n, double *A, magma_int_t lda, magma_int_t *info) 
DPOTRF computes the Cholesky factorization of a real symmetric positive definite matrix A. More...  
magma_int_t  magma_dpotrf3_mgpu (magma_int_t ngpu, magma_uplo_t uplo, magma_int_t m, magma_int_t n, magma_int_t off_i, magma_int_t off_j, magma_int_t nb, magmaDouble_ptr d_lA[], magma_int_t ldda, magmaDouble_ptr d_lP[], magma_int_t lddp, double *A, magma_int_t lda, magma_int_t h, magma_queue_t queues[][3], magma_event_t events[][5], magma_int_t *info) 
DPOTRF computes the Cholesky factorization of a real symmetric positive definite matrix dA. More...  
magma_int_t  magma_dpotrf_gpu (magma_uplo_t uplo, magma_int_t n, magmaDouble_ptr dA, magma_int_t ldda, magma_int_t *info) 
DPOTRF computes the Cholesky factorization of a real symmetric positive definite matrix dA. More...  
magma_int_t  magma_dpotrf_m (magma_int_t ngpu, magma_uplo_t uplo, magma_int_t n, double *A, magma_int_t lda, magma_int_t *info) 
DPOTRF computes the Cholesky factorization of a real symmetric positive definite matrix A. More...  
magma_int_t  magma_dpotrf_mgpu (magma_int_t ngpu, magma_uplo_t uplo, magma_int_t n, magmaDouble_ptr d_lA[], magma_int_t ldda, magma_int_t *info) 
DPOTRF computes the Cholesky factorization of a real symmetric positive definite matrix dA. More...  
magma_int_t  magma_spotrf (magma_uplo_t uplo, magma_int_t n, float *A, magma_int_t lda, magma_int_t *info) 
SPOTRF computes the Cholesky factorization of a real symmetric positive definite matrix A. More...  
magma_int_t  magma_spotrf3_mgpu (magma_int_t ngpu, magma_uplo_t uplo, magma_int_t m, magma_int_t n, magma_int_t off_i, magma_int_t off_j, magma_int_t nb, magmaFloat_ptr d_lA[], magma_int_t ldda, magmaFloat_ptr d_lP[], magma_int_t lddp, float *A, magma_int_t lda, magma_int_t h, magma_queue_t queues[][3], magma_event_t events[][5], magma_int_t *info) 
SPOTRF computes the Cholesky factorization of a real symmetric positive definite matrix dA. More...  
magma_int_t  magma_spotrf_gpu (magma_uplo_t uplo, magma_int_t n, magmaFloat_ptr dA, magma_int_t ldda, magma_int_t *info) 
SPOTRF computes the Cholesky factorization of a real symmetric positive definite matrix dA. More...  
magma_int_t  magma_spotrf_m (magma_int_t ngpu, magma_uplo_t uplo, magma_int_t n, float *A, magma_int_t lda, magma_int_t *info) 
SPOTRF computes the Cholesky factorization of a real symmetric positive definite matrix A. More...  
magma_int_t  magma_spotrf_mgpu (magma_int_t ngpu, magma_uplo_t uplo, magma_int_t n, magmaFloat_ptr d_lA[], magma_int_t ldda, magma_int_t *info) 
SPOTRF computes the Cholesky factorization of a real symmetric positive definite matrix dA. More...  
magma_int_t  magma_zpotrf (magma_uplo_t uplo, magma_int_t n, magmaDoubleComplex *A, magma_int_t lda, magma_int_t *info) 
ZPOTRF computes the Cholesky factorization of a complex Hermitian positive definite matrix A. More...  
magma_int_t  magma_zpotrf3_mgpu (magma_int_t ngpu, magma_uplo_t uplo, magma_int_t m, magma_int_t n, magma_int_t off_i, magma_int_t off_j, magma_int_t nb, magmaDoubleComplex_ptr d_lA[], magma_int_t ldda, magmaDoubleComplex_ptr d_lP[], magma_int_t lddp, magmaDoubleComplex *A, magma_int_t lda, magma_int_t h, magma_queue_t queues[][3], magma_event_t events[][5], magma_int_t *info) 
ZPOTRF computes the Cholesky factorization of a complex Hermitian positive definite matrix dA. More...  
magma_int_t  magma_zpotrf_gpu (magma_uplo_t uplo, magma_int_t n, magmaDoubleComplex_ptr dA, magma_int_t ldda, magma_int_t *info) 
ZPOTRF computes the Cholesky factorization of a complex Hermitian positive definite matrix dA. More...  
magma_int_t  magma_zpotrf_m (magma_int_t ngpu, magma_uplo_t uplo, magma_int_t n, magmaDoubleComplex *A, magma_int_t lda, magma_int_t *info) 
ZPOTRF computes the Cholesky factorization of a complex Hermitian positive definite matrix A. More...  
magma_int_t  magma_zpotrf_mgpu (magma_int_t ngpu, magma_uplo_t uplo, magma_int_t n, magmaDoubleComplex_ptr d_lA[], magma_int_t ldda, magma_int_t *info) 
ZPOTRF computes the Cholesky factorization of a complex Hermitian positive definite matrix dA. More...  
magma_int_t magma_cpotrf  (  magma_uplo_t  uplo, 
magma_int_t  n,  
magmaFloatComplex *  A,  
magma_int_t  lda,  
magma_int_t *  info  
) 
CPOTRF computes the Cholesky factorization of a complex Hermitian positive definite matrix A.
This version does not require work space on the GPU passed as input. GPU memory is allocated in the routine.
The factorization has the form A = U**H * U, if uplo = MagmaUpper, or A = L * L**H, if uplo = MagmaLower, where U is an upper triangular matrix and L is lower triangular.
This is the block version of the algorithm, calling Level 3 BLAS.
This uses multiple queues to overlap communication and computation.
[in]  uplo  magma_uplo_t

[in]  n  INTEGER The order of the matrix A. N >= 0. 
[in,out]  A  COMPLEX array, dimension (LDA,N) On entry, the Hermitian matrix A. If uplo = MagmaUpper, the leading NbyN upper triangular part of A contains the upper triangular part of the matrix A, and the strictly lower triangular part of A is not referenced. If uplo = MagmaLower, the leading NbyN lower triangular part of A contains the lower triangular part of the matrix A, and the strictly upper triangular part of A is not referenced. On exit, if INFO = 0, the factor U or L from the Cholesky factorization A = U**H * U or A = L * L**H. Higher performance is achieved if A is in pinned memory, e.g. allocated using magma_malloc_pinned. 
[in]  lda  INTEGER The leading dimension of the array A. LDA >= max(1,N). 
[out]  info  INTEGER

magma_int_t magma_cpotrf3_mgpu  (  magma_int_t  ngpu, 
magma_uplo_t  uplo,  
magma_int_t  m,  
magma_int_t  n,  
magma_int_t  off_i,  
magma_int_t  off_j,  
magma_int_t  nb,  
magmaFloatComplex_ptr  d_lA[],  
magma_int_t  ldda,  
magmaFloatComplex_ptr  d_lP[],  
magma_int_t  lddp,  
magmaFloatComplex *  A,  
magma_int_t  lda,  
magma_int_t  h,  
magma_queue_t  queues[][3],  
magma_event_t  events[][5],  
magma_int_t *  info  
) 
CPOTRF computes the Cholesky factorization of a complex Hermitian positive definite matrix dA.
Auxiliary subroutine for cpotrf2_ooc. It is multiple gpu interface to compute Cholesky of a "rectangular" matrix.
The factorization has the form dA = U**H * U, if UPLO = MagmaUpper, or dA = L * L**H, if UPLO = MagmaLower, where U is an upper triangular matrix and L is lower triangular.
This is the block version of the algorithm, calling Level 3 BLAS.
[in]  ngpu  INTEGER Number of GPUs to use. ngpu > 0. 
[in]  uplo  magma_uplo_t

[in]  m  INTEGER The number of rows of the submatrix to be factorized. 
[in]  n  INTEGER The number of columns of the submatrix to be factorized. 
[in]  off_i  INTEGER The first row index of the submatrix to be factorized. 
[in]  off_j  INTEGER The first column index of the submatrix to be factorized. 
[in]  nb  INTEGER The block size used for the factorization and distribution. 
[in,out]  d_lA  COMPLEX array of pointers on the GPU, dimension (ngpu). On entry, the Hermitian matrix dA distributed over GPU. (d_lAT[d] points to the local matrix on dth GPU). If UPLO = MagmaLower or MagmaUpper, it respectively uses a 1D block column or row cyclic format (with the block size nb), and each local matrix is stored by column. If UPLO = MagmaUpper, the leading NbyN upper triangular part of dA contains the upper triangular part of the matrix dA, and the strictly lower triangular part of dA is not referenced. If UPLO = MagmaLower, the leading NbyN lower triangular part of dA contains the lower triangular part of the matrix dA, and the strictly upper triangular part of dA is not referenced. On exit, if INFO = 0, the factor U or L from the Cholesky factorization dA = U**H * U or dA = L * L**H. 
[in,out]  d_lP  COMPLEX array of pointers on the GPU, dimension (ngpu). d_LAT[d] points to workspace of size h*lddp*nb on dth GPU. 
[in]  lddp  INTEGER The leading dimension of the array dP. LDDA >= max(1,N). 
[in]  ldda  INTEGER The leading dimension of the array dA. LDDA >= max(1,N). To benefit from coalescent memory accesses LDDA must be divisible by 16. 
[in,out]  A  COMPLEX array on the CPU, dimension (LDA,H*NB) On exit, the panel is copied back to the CPU 
[in]  lda  INTEGER The leading dimension of the array A. LDA >= max(1,N). 
[in]  h  INTEGER It specifies the size of the CPU workspace, A. 
[in]  queues  magma_queue_t queues is of dimension (ngpu,3) and contains the queues used for the partial factorization. 
[in]  events  magma_event_t events is of dimension(ngpu,5) and contains the events used for the partial factorization. 
[out]  info  INTEGER

magma_int_t magma_cpotrf_gpu  (  magma_uplo_t  uplo, 
magma_int_t  n,  
magmaFloatComplex_ptr  dA,  
magma_int_t  ldda,  
magma_int_t *  info  
) 
CPOTRF computes the Cholesky factorization of a complex Hermitian positive definite matrix dA.
The factorization has the form dA = U**H * U, if UPLO = MagmaUpper, or dA = L * L**H, if UPLO = MagmaLower, where U is an upper triangular matrix and L is lower triangular.
This is the block version of the algorithm, calling Level 3 BLAS.
[in]  uplo  magma_uplo_t

[in]  n  INTEGER The order of the matrix dA. N >= 0. 
[in,out]  dA  COMPLEX array on the GPU, dimension (LDDA,N) On entry, the Hermitian matrix dA. If UPLO = MagmaUpper, the leading NbyN upper triangular part of dA contains the upper triangular part of the matrix dA, and the strictly lower triangular part of dA is not referenced. If UPLO = MagmaLower, the leading NbyN lower triangular part of dA contains the lower triangular part of the matrix dA, and the strictly upper triangular part of dA is not referenced. On exit, if INFO = 0, the factor U or L from the Cholesky factorization dA = U**H * U or dA = L * L**H. 
[in]  ldda  INTEGER The leading dimension of the array dA. LDDA >= max(1,N). To benefit from coalescent memory accesses LDDA must be divisible by 16. 
[out]  info  INTEGER

magma_int_t magma_cpotrf_m  (  magma_int_t  ngpu, 
magma_uplo_t  uplo,  
magma_int_t  n,  
magmaFloatComplex *  A,  
magma_int_t  lda,  
magma_int_t *  info  
) 
CPOTRF computes the Cholesky factorization of a complex Hermitian positive definite matrix A.
This version does not require work space on the GPU passed as input. GPU memory is allocated in the routine. The matrix A may exceed the GPU memory.
The factorization has the form A = U**H * U, if UPLO = MagmaUpper, or A = L * L**H, if UPLO = MagmaLower, where U is an upper triangular matrix and L is lower triangular.
This is the block version of the algorithm, calling Level 3 BLAS.
[in]  ngpu  INTEGER Number of GPUs to use. ngpu > 0. 
[in]  uplo  magma_uplo_t

[in]  n  INTEGER The order of the matrix A. N >= 0. 
[in,out]  A  COMPLEX array, dimension (LDA,N) On entry, the symmetric matrix A. If UPLO = MagmaUpper, the leading NbyN upper triangular part of A contains the upper triangular part of the matrix A, and the strictly lower triangular part of A is not referenced. If UPLO = MagmaLower, the leading NbyN lower triangular part of A contains the lower triangular part of the matrix A, and the strictly upper triangular part of A is not referenced. On exit, if INFO = 0, the factor U or L from the Cholesky factorization A = U**H * U or A = L * L**H. Higher performance is achieved if A is in pinned memory, e.g. allocated using magma_malloc_pinned. 
[in]  lda  INTEGER The leading dimension of the array A. LDA >= max(1,N). 
[out]  info  INTEGER

magma_int_t magma_cpotrf_mgpu  (  magma_int_t  ngpu, 
magma_uplo_t  uplo,  
magma_int_t  n,  
magmaFloatComplex_ptr  d_lA[],  
magma_int_t  ldda,  
magma_int_t *  info  
) 
CPOTRF computes the Cholesky factorization of a complex Hermitian positive definite matrix dA.
The factorization has the form dA = U**H * U, if UPLO = MagmaUpper, or dA = L * L**H, if UPLO = MagmaLower, where U is an upper triangular matrix and L is lower triangular.
This is the block version of the algorithm, calling Level 3 BLAS.
[in]  ngpu  INTEGER Number of GPUs to use. ngpu > 0. 
[in]  uplo  magma_uplo_t

[in]  n  INTEGER The order of the matrix dA. N >= 0. 
[in,out]  d_lA  COMPLEX array of pointers on the GPU, dimension (ngpu) On entry, the Hermitian matrix dA distributed over GPUs (d_lA[d] points to the local matrix on the dth GPU). It is distributed in 1D block column or row cyclic (with the block size of nb) if UPLO = MagmaUpper or MagmaLower, respectively. If UPLO = MagmaUpper, the leading NbyN upper triangular part of dA contains the upper triangular part of the matrix dA, and the strictly lower triangular part of dA is not referenced. If UPLO = MagmaLower, the leading NbyN lower triangular part of dA contains the lower triangular part of the matrix dA, and the strictly upper triangular part of dA is not referenced. On exit, if INFO = 0, the factor U or L from the Cholesky factorization dA = U**H * U or dA = L * L**H. 
[in]  ldda  INTEGER The leading dimension of the array d_lA. LDDA >= max(1,N). To benefit from coalescent memory accesses LDDA must be divisible by 16. 
[out]  info  INTEGER

magma_int_t magma_dpotrf  (  magma_uplo_t  uplo, 
magma_int_t  n,  
double *  A,  
magma_int_t  lda,  
magma_int_t *  info  
) 
DPOTRF computes the Cholesky factorization of a real symmetric positive definite matrix A.
This version does not require work space on the GPU passed as input. GPU memory is allocated in the routine.
The factorization has the form A = U**H * U, if uplo = MagmaUpper, or A = L * L**H, if uplo = MagmaLower, where U is an upper triangular matrix and L is lower triangular.
This is the block version of the algorithm, calling Level 3 BLAS.
This uses multiple queues to overlap communication and computation.
[in]  uplo  magma_uplo_t

[in]  n  INTEGER The order of the matrix A. N >= 0. 
[in,out]  A  DOUBLE PRECISION array, dimension (LDA,N) On entry, the symmetric matrix A. If uplo = MagmaUpper, the leading NbyN upper triangular part of A contains the upper triangular part of the matrix A, and the strictly lower triangular part of A is not referenced. If uplo = MagmaLower, the leading NbyN lower triangular part of A contains the lower triangular part of the matrix A, and the strictly upper triangular part of A is not referenced. On exit, if INFO = 0, the factor U or L from the Cholesky factorization A = U**H * U or A = L * L**H. Higher performance is achieved if A is in pinned memory, e.g. allocated using magma_malloc_pinned. 
[in]  lda  INTEGER The leading dimension of the array A. LDA >= max(1,N). 
[out]  info  INTEGER

magma_int_t magma_dpotrf3_mgpu  (  magma_int_t  ngpu, 
magma_uplo_t  uplo,  
magma_int_t  m,  
magma_int_t  n,  
magma_int_t  off_i,  
magma_int_t  off_j,  
magma_int_t  nb,  
magmaDouble_ptr  d_lA[],  
magma_int_t  ldda,  
magmaDouble_ptr  d_lP[],  
magma_int_t  lddp,  
double *  A,  
magma_int_t  lda,  
magma_int_t  h,  
magma_queue_t  queues[][3],  
magma_event_t  events[][5],  
magma_int_t *  info  
) 
DPOTRF computes the Cholesky factorization of a real symmetric positive definite matrix dA.
Auxiliary subroutine for dpotrf2_ooc. It is multiple gpu interface to compute Cholesky of a "rectangular" matrix.
The factorization has the form dA = U**H * U, if UPLO = MagmaUpper, or dA = L * L**H, if UPLO = MagmaLower, where U is an upper triangular matrix and L is lower triangular.
This is the block version of the algorithm, calling Level 3 BLAS.
[in]  ngpu  INTEGER Number of GPUs to use. ngpu > 0. 
[in]  uplo  magma_uplo_t

[in]  m  INTEGER The number of rows of the submatrix to be factorized. 
[in]  n  INTEGER The number of columns of the submatrix to be factorized. 
[in]  off_i  INTEGER The first row index of the submatrix to be factorized. 
[in]  off_j  INTEGER The first column index of the submatrix to be factorized. 
[in]  nb  INTEGER The block size used for the factorization and distribution. 
[in,out]  d_lA  DOUBLE PRECISION array of pointers on the GPU, dimension (ngpu). On entry, the symmetric matrix dA distributed over GPU. (d_lAT[d] points to the local matrix on dth GPU). If UPLO = MagmaLower or MagmaUpper, it respectively uses a 1D block column or row cyclic format (with the block size nb), and each local matrix is stored by column. If UPLO = MagmaUpper, the leading NbyN upper triangular part of dA contains the upper triangular part of the matrix dA, and the strictly lower triangular part of dA is not referenced. If UPLO = MagmaLower, the leading NbyN lower triangular part of dA contains the lower triangular part of the matrix dA, and the strictly upper triangular part of dA is not referenced. On exit, if INFO = 0, the factor U or L from the Cholesky factorization dA = U**H * U or dA = L * L**H. 
[in,out]  d_lP  DOUBLE PRECISION array of pointers on the GPU, dimension (ngpu). d_LAT[d] points to workspace of size h*lddp*nb on dth GPU. 
[in]  lddp  INTEGER The leading dimension of the array dP. LDDA >= max(1,N). 
[in]  ldda  INTEGER The leading dimension of the array dA. LDDA >= max(1,N). To benefit from coalescent memory accesses LDDA must be divisible by 16. 
[in,out]  A  DOUBLE PRECISION array on the CPU, dimension (LDA,H*NB) On exit, the panel is copied back to the CPU 
[in]  lda  INTEGER The leading dimension of the array A. LDA >= max(1,N). 
[in]  h  INTEGER It specifies the size of the CPU workspace, A. 
[in]  queues  magma_queue_t queues is of dimension (ngpu,3) and contains the queues used for the partial factorization. 
[in]  events  magma_event_t events is of dimension(ngpu,5) and contains the events used for the partial factorization. 
[out]  info  INTEGER

magma_int_t magma_dpotrf_gpu  (  magma_uplo_t  uplo, 
magma_int_t  n,  
magmaDouble_ptr  dA,  
magma_int_t  ldda,  
magma_int_t *  info  
) 
DPOTRF computes the Cholesky factorization of a real symmetric positive definite matrix dA.
The factorization has the form dA = U**H * U, if UPLO = MagmaUpper, or dA = L * L**H, if UPLO = MagmaLower, where U is an upper triangular matrix and L is lower triangular.
This is the block version of the algorithm, calling Level 3 BLAS.
[in]  uplo  magma_uplo_t

[in]  n  INTEGER The order of the matrix dA. N >= 0. 
[in,out]  dA  DOUBLE PRECISION array on the GPU, dimension (LDDA,N) On entry, the symmetric matrix dA. If UPLO = MagmaUpper, the leading NbyN upper triangular part of dA contains the upper triangular part of the matrix dA, and the strictly lower triangular part of dA is not referenced. If UPLO = MagmaLower, the leading NbyN lower triangular part of dA contains the lower triangular part of the matrix dA, and the strictly upper triangular part of dA is not referenced. On exit, if INFO = 0, the factor U or L from the Cholesky factorization dA = U**H * U or dA = L * L**H. 
[in]  ldda  INTEGER The leading dimension of the array dA. LDDA >= max(1,N). To benefit from coalescent memory accesses LDDA must be divisible by 16. 
[out]  info  INTEGER

magma_int_t magma_dpotrf_m  (  magma_int_t  ngpu, 
magma_uplo_t  uplo,  
magma_int_t  n,  
double *  A,  
magma_int_t  lda,  
magma_int_t *  info  
) 
DPOTRF computes the Cholesky factorization of a real symmetric positive definite matrix A.
This version does not require work space on the GPU passed as input. GPU memory is allocated in the routine. The matrix A may exceed the GPU memory.
The factorization has the form A = U**H * U, if UPLO = MagmaUpper, or A = L * L**H, if UPLO = MagmaLower, where U is an upper triangular matrix and L is lower triangular.
This is the block version of the algorithm, calling Level 3 BLAS.
[in]  ngpu  INTEGER Number of GPUs to use. ngpu > 0. 
[in]  uplo  magma_uplo_t

[in]  n  INTEGER The order of the matrix A. N >= 0. 
[in,out]  A  DOUBLE PRECISION array, dimension (LDA,N) On entry, the symmetric matrix A. If UPLO = MagmaUpper, the leading NbyN upper triangular part of A contains the upper triangular part of the matrix A, and the strictly lower triangular part of A is not referenced. If UPLO = MagmaLower, the leading NbyN lower triangular part of A contains the lower triangular part of the matrix A, and the strictly upper triangular part of A is not referenced. On exit, if INFO = 0, the factor U or L from the Cholesky factorization A = U**H * U or A = L * L**H. Higher performance is achieved if A is in pinned memory, e.g. allocated using magma_malloc_pinned. 
[in]  lda  INTEGER The leading dimension of the array A. LDA >= max(1,N). 
[out]  info  INTEGER

magma_int_t magma_dpotrf_mgpu  (  magma_int_t  ngpu, 
magma_uplo_t  uplo,  
magma_int_t  n,  
magmaDouble_ptr  d_lA[],  
magma_int_t  ldda,  
magma_int_t *  info  
) 
DPOTRF computes the Cholesky factorization of a real symmetric positive definite matrix dA.
The factorization has the form dA = U**H * U, if UPLO = MagmaUpper, or dA = L * L**H, if UPLO = MagmaLower, where U is an upper triangular matrix and L is lower triangular.
This is the block version of the algorithm, calling Level 3 BLAS.
[in]  ngpu  INTEGER Number of GPUs to use. ngpu > 0. 
[in]  uplo  magma_uplo_t

[in]  n  INTEGER The order of the matrix dA. N >= 0. 
[in,out]  d_lA  DOUBLE PRECISION array of pointers on the GPU, dimension (ngpu) On entry, the symmetric matrix dA distributed over GPUs (d_lA[d] points to the local matrix on the dth GPU). It is distributed in 1D block column or row cyclic (with the block size of nb) if UPLO = MagmaUpper or MagmaLower, respectively. If UPLO = MagmaUpper, the leading NbyN upper triangular part of dA contains the upper triangular part of the matrix dA, and the strictly lower triangular part of dA is not referenced. If UPLO = MagmaLower, the leading NbyN lower triangular part of dA contains the lower triangular part of the matrix dA, and the strictly upper triangular part of dA is not referenced. On exit, if INFO = 0, the factor U or L from the Cholesky factorization dA = U**H * U or dA = L * L**H. 
[in]  ldda  INTEGER The leading dimension of the array d_lA. LDDA >= max(1,N). To benefit from coalescent memory accesses LDDA must be divisible by 16. 
[out]  info  INTEGER

magma_int_t magma_spotrf  (  magma_uplo_t  uplo, 
magma_int_t  n,  
float *  A,  
magma_int_t  lda,  
magma_int_t *  info  
) 
SPOTRF computes the Cholesky factorization of a real symmetric positive definite matrix A.
This version does not require work space on the GPU passed as input. GPU memory is allocated in the routine.
The factorization has the form A = U**H * U, if uplo = MagmaUpper, or A = L * L**H, if uplo = MagmaLower, where U is an upper triangular matrix and L is lower triangular.
This is the block version of the algorithm, calling Level 3 BLAS.
This uses multiple queues to overlap communication and computation.
[in]  uplo  magma_uplo_t

[in]  n  INTEGER The order of the matrix A. N >= 0. 
[in,out]  A  REAL array, dimension (LDA,N) On entry, the symmetric matrix A. If uplo = MagmaUpper, the leading NbyN upper triangular part of A contains the upper triangular part of the matrix A, and the strictly lower triangular part of A is not referenced. If uplo = MagmaLower, the leading NbyN lower triangular part of A contains the lower triangular part of the matrix A, and the strictly upper triangular part of A is not referenced. On exit, if INFO = 0, the factor U or L from the Cholesky factorization A = U**H * U or A = L * L**H. Higher performance is achieved if A is in pinned memory, e.g. allocated using magma_malloc_pinned. 
[in]  lda  INTEGER The leading dimension of the array A. LDA >= max(1,N). 
[out]  info  INTEGER

magma_int_t magma_spotrf3_mgpu  (  magma_int_t  ngpu, 
magma_uplo_t  uplo,  
magma_int_t  m,  
magma_int_t  n,  
magma_int_t  off_i,  
magma_int_t  off_j,  
magma_int_t  nb,  
magmaFloat_ptr  d_lA[],  
magma_int_t  ldda,  
magmaFloat_ptr  d_lP[],  
magma_int_t  lddp,  
float *  A,  
magma_int_t  lda,  
magma_int_t  h,  
magma_queue_t  queues[][3],  
magma_event_t  events[][5],  
magma_int_t *  info  
) 
SPOTRF computes the Cholesky factorization of a real symmetric positive definite matrix dA.
Auxiliary subroutine for spotrf2_ooc. It is multiple gpu interface to compute Cholesky of a "rectangular" matrix.
The factorization has the form dA = U**H * U, if UPLO = MagmaUpper, or dA = L * L**H, if UPLO = MagmaLower, where U is an upper triangular matrix and L is lower triangular.
This is the block version of the algorithm, calling Level 3 BLAS.
[in]  ngpu  INTEGER Number of GPUs to use. ngpu > 0. 
[in]  uplo  magma_uplo_t

[in]  m  INTEGER The number of rows of the submatrix to be factorized. 
[in]  n  INTEGER The number of columns of the submatrix to be factorized. 
[in]  off_i  INTEGER The first row index of the submatrix to be factorized. 
[in]  off_j  INTEGER The first column index of the submatrix to be factorized. 
[in]  nb  INTEGER The block size used for the factorization and distribution. 
[in,out]  d_lA  REAL array of pointers on the GPU, dimension (ngpu). On entry, the symmetric matrix dA distributed over GPU. (d_lAT[d] points to the local matrix on dth GPU). If UPLO = MagmaLower or MagmaUpper, it respectively uses a 1D block column or row cyclic format (with the block size nb), and each local matrix is stored by column. If UPLO = MagmaUpper, the leading NbyN upper triangular part of dA contains the upper triangular part of the matrix dA, and the strictly lower triangular part of dA is not referenced. If UPLO = MagmaLower, the leading NbyN lower triangular part of dA contains the lower triangular part of the matrix dA, and the strictly upper triangular part of dA is not referenced. On exit, if INFO = 0, the factor U or L from the Cholesky factorization dA = U**H * U or dA = L * L**H. 
[in,out]  d_lP  REAL array of pointers on the GPU, dimension (ngpu). d_LAT[d] points to workspace of size h*lddp*nb on dth GPU. 
[in]  lddp  INTEGER The leading dimension of the array dP. LDDA >= max(1,N). 
[in]  ldda  INTEGER The leading dimension of the array dA. LDDA >= max(1,N). To benefit from coalescent memory accesses LDDA must be divisible by 16. 
[in,out]  A  REAL array on the CPU, dimension (LDA,H*NB) On exit, the panel is copied back to the CPU 
[in]  lda  INTEGER The leading dimension of the array A. LDA >= max(1,N). 
[in]  h  INTEGER It specifies the size of the CPU workspace, A. 
[in]  queues  magma_queue_t queues is of dimension (ngpu,3) and contains the queues used for the partial factorization. 
[in]  events  magma_event_t events is of dimension(ngpu,5) and contains the events used for the partial factorization. 
[out]  info  INTEGER

magma_int_t magma_spotrf_gpu  (  magma_uplo_t  uplo, 
magma_int_t  n,  
magmaFloat_ptr  dA,  
magma_int_t  ldda,  
magma_int_t *  info  
) 
SPOTRF computes the Cholesky factorization of a real symmetric positive definite matrix dA.
The factorization has the form dA = U**H * U, if UPLO = MagmaUpper, or dA = L * L**H, if UPLO = MagmaLower, where U is an upper triangular matrix and L is lower triangular.
This is the block version of the algorithm, calling Level 3 BLAS.
[in]  uplo  magma_uplo_t

[in]  n  INTEGER The order of the matrix dA. N >= 0. 
[in,out]  dA  REAL array on the GPU, dimension (LDDA,N) On entry, the symmetric matrix dA. If UPLO = MagmaUpper, the leading NbyN upper triangular part of dA contains the upper triangular part of the matrix dA, and the strictly lower triangular part of dA is not referenced. If UPLO = MagmaLower, the leading NbyN lower triangular part of dA contains the lower triangular part of the matrix dA, and the strictly upper triangular part of dA is not referenced. On exit, if INFO = 0, the factor U or L from the Cholesky factorization dA = U**H * U or dA = L * L**H. 
[in]  ldda  INTEGER The leading dimension of the array dA. LDDA >= max(1,N). To benefit from coalescent memory accesses LDDA must be divisible by 16. 
[out]  info  INTEGER

magma_int_t magma_spotrf_m  (  magma_int_t  ngpu, 
magma_uplo_t  uplo,  
magma_int_t  n,  
float *  A,  
magma_int_t  lda,  
magma_int_t *  info  
) 
SPOTRF computes the Cholesky factorization of a real symmetric positive definite matrix A.
This version does not require work space on the GPU passed as input. GPU memory is allocated in the routine. The matrix A may exceed the GPU memory.
The factorization has the form A = U**H * U, if UPLO = MagmaUpper, or A = L * L**H, if UPLO = MagmaLower, where U is an upper triangular matrix and L is lower triangular.
This is the block version of the algorithm, calling Level 3 BLAS.
[in]  ngpu  INTEGER Number of GPUs to use. ngpu > 0. 
[in]  uplo  magma_uplo_t

[in]  n  INTEGER The order of the matrix A. N >= 0. 
[in,out]  A  REAL array, dimension (LDA,N) On entry, the symmetric matrix A. If UPLO = MagmaUpper, the leading NbyN upper triangular part of A contains the upper triangular part of the matrix A, and the strictly lower triangular part of A is not referenced. If UPLO = MagmaLower, the leading NbyN lower triangular part of A contains the lower triangular part of the matrix A, and the strictly upper triangular part of A is not referenced. On exit, if INFO = 0, the factor U or L from the Cholesky factorization A = U**H * U or A = L * L**H. Higher performance is achieved if A is in pinned memory, e.g. allocated using magma_malloc_pinned. 
[in]  lda  INTEGER The leading dimension of the array A. LDA >= max(1,N). 
[out]  info  INTEGER

magma_int_t magma_spotrf_mgpu  (  magma_int_t  ngpu, 
magma_uplo_t  uplo,  
magma_int_t  n,  
magmaFloat_ptr  d_lA[],  
magma_int_t  ldda,  
magma_int_t *  info  
) 
SPOTRF computes the Cholesky factorization of a real symmetric positive definite matrix dA.
The factorization has the form dA = U**H * U, if UPLO = MagmaUpper, or dA = L * L**H, if UPLO = MagmaLower, where U is an upper triangular matrix and L is lower triangular.
This is the block version of the algorithm, calling Level 3 BLAS.
[in]  ngpu  INTEGER Number of GPUs to use. ngpu > 0. 
[in]  uplo  magma_uplo_t

[in]  n  INTEGER The order of the matrix dA. N >= 0. 
[in,out]  d_lA  REAL array of pointers on the GPU, dimension (ngpu) On entry, the symmetric matrix dA distributed over GPUs (d_lA[d] points to the local matrix on the dth GPU). It is distributed in 1D block column or row cyclic (with the block size of nb) if UPLO = MagmaUpper or MagmaLower, respectively. If UPLO = MagmaUpper, the leading NbyN upper triangular part of dA contains the upper triangular part of the matrix dA, and the strictly lower triangular part of dA is not referenced. If UPLO = MagmaLower, the leading NbyN lower triangular part of dA contains the lower triangular part of the matrix dA, and the strictly upper triangular part of dA is not referenced. On exit, if INFO = 0, the factor U or L from the Cholesky factorization dA = U**H * U or dA = L * L**H. 
[in]  ldda  INTEGER The leading dimension of the array d_lA. LDDA >= max(1,N). To benefit from coalescent memory accesses LDDA must be divisible by 16. 
[out]  info  INTEGER

magma_int_t magma_zpotrf  (  magma_uplo_t  uplo, 
magma_int_t  n,  
magmaDoubleComplex *  A,  
magma_int_t  lda,  
magma_int_t *  info  
) 
ZPOTRF computes the Cholesky factorization of a complex Hermitian positive definite matrix A.
This version does not require work space on the GPU passed as input. GPU memory is allocated in the routine.
The factorization has the form A = U**H * U, if uplo = MagmaUpper, or A = L * L**H, if uplo = MagmaLower, where U is an upper triangular matrix and L is lower triangular.
This is the block version of the algorithm, calling Level 3 BLAS.
This uses multiple queues to overlap communication and computation.
[in]  uplo  magma_uplo_t

[in]  n  INTEGER The order of the matrix A. N >= 0. 
[in,out]  A  COMPLEX_16 array, dimension (LDA,N) On entry, the Hermitian matrix A. If uplo = MagmaUpper, the leading NbyN upper triangular part of A contains the upper triangular part of the matrix A, and the strictly lower triangular part of A is not referenced. If uplo = MagmaLower, the leading NbyN lower triangular part of A contains the lower triangular part of the matrix A, and the strictly upper triangular part of A is not referenced. On exit, if INFO = 0, the factor U or L from the Cholesky factorization A = U**H * U or A = L * L**H. Higher performance is achieved if A is in pinned memory, e.g. allocated using magma_malloc_pinned. 
[in]  lda  INTEGER The leading dimension of the array A. LDA >= max(1,N). 
[out]  info  INTEGER

magma_int_t magma_zpotrf3_mgpu  (  magma_int_t  ngpu, 
magma_uplo_t  uplo,  
magma_int_t  m,  
magma_int_t  n,  
magma_int_t  off_i,  
magma_int_t  off_j,  
magma_int_t  nb,  
magmaDoubleComplex_ptr  d_lA[],  
magma_int_t  ldda,  
magmaDoubleComplex_ptr  d_lP[],  
magma_int_t  lddp,  
magmaDoubleComplex *  A,  
magma_int_t  lda,  
magma_int_t  h,  
magma_queue_t  queues[][3],  
magma_event_t  events[][5],  
magma_int_t *  info  
) 
ZPOTRF computes the Cholesky factorization of a complex Hermitian positive definite matrix dA.
Auxiliary subroutine for zpotrf2_ooc. It is multiple gpu interface to compute Cholesky of a "rectangular" matrix.
The factorization has the form dA = U**H * U, if UPLO = MagmaUpper, or dA = L * L**H, if UPLO = MagmaLower, where U is an upper triangular matrix and L is lower triangular.
This is the block version of the algorithm, calling Level 3 BLAS.
[in]  ngpu  INTEGER Number of GPUs to use. ngpu > 0. 
[in]  uplo  magma_uplo_t

[in]  m  INTEGER The number of rows of the submatrix to be factorized. 
[in]  n  INTEGER The number of columns of the submatrix to be factorized. 
[in]  off_i  INTEGER The first row index of the submatrix to be factorized. 
[in]  off_j  INTEGER The first column index of the submatrix to be factorized. 
[in]  nb  INTEGER The block size used for the factorization and distribution. 
[in,out]  d_lA  COMPLEX_16 array of pointers on the GPU, dimension (ngpu). On entry, the Hermitian matrix dA distributed over GPU. (d_lAT[d] points to the local matrix on dth GPU). If UPLO = MagmaLower or MagmaUpper, it respectively uses a 1D block column or row cyclic format (with the block size nb), and each local matrix is stored by column. If UPLO = MagmaUpper, the leading NbyN upper triangular part of dA contains the upper triangular part of the matrix dA, and the strictly lower triangular part of dA is not referenced. If UPLO = MagmaLower, the leading NbyN lower triangular part of dA contains the lower triangular part of the matrix dA, and the strictly upper triangular part of dA is not referenced. On exit, if INFO = 0, the factor U or L from the Cholesky factorization dA = U**H * U or dA = L * L**H. 
[in,out]  d_lP  COMPLEX_16 array of pointers on the GPU, dimension (ngpu). d_LAT[d] points to workspace of size h*lddp*nb on dth GPU. 
[in]  lddp  INTEGER The leading dimension of the array dP. LDDA >= max(1,N). 
[in]  ldda  INTEGER The leading dimension of the array dA. LDDA >= max(1,N). To benefit from coalescent memory accesses LDDA must be divisible by 16. 
[in,out]  A  COMPLEX_16 array on the CPU, dimension (LDA,H*NB) On exit, the panel is copied back to the CPU 
[in]  lda  INTEGER The leading dimension of the array A. LDA >= max(1,N). 
[in]  h  INTEGER It specifies the size of the CPU workspace, A. 
[in]  queues  magma_queue_t queues is of dimension (ngpu,3) and contains the queues used for the partial factorization. 
[in]  events  magma_event_t events is of dimension(ngpu,5) and contains the events used for the partial factorization. 
[out]  info  INTEGER

magma_int_t magma_zpotrf_gpu  (  magma_uplo_t  uplo, 
magma_int_t  n,  
magmaDoubleComplex_ptr  dA,  
magma_int_t  ldda,  
magma_int_t *  info  
) 
ZPOTRF computes the Cholesky factorization of a complex Hermitian positive definite matrix dA.
The factorization has the form dA = U**H * U, if UPLO = MagmaUpper, or dA = L * L**H, if UPLO = MagmaLower, where U is an upper triangular matrix and L is lower triangular.
This is the block version of the algorithm, calling Level 3 BLAS.
[in]  uplo  magma_uplo_t

[in]  n  INTEGER The order of the matrix dA. N >= 0. 
[in,out]  dA  COMPLEX_16 array on the GPU, dimension (LDDA,N) On entry, the Hermitian matrix dA. If UPLO = MagmaUpper, the leading NbyN upper triangular part of dA contains the upper triangular part of the matrix dA, and the strictly lower triangular part of dA is not referenced. If UPLO = MagmaLower, the leading NbyN lower triangular part of dA contains the lower triangular part of the matrix dA, and the strictly upper triangular part of dA is not referenced. On exit, if INFO = 0, the factor U or L from the Cholesky factorization dA = U**H * U or dA = L * L**H. 
[in]  ldda  INTEGER The leading dimension of the array dA. LDDA >= max(1,N). To benefit from coalescent memory accesses LDDA must be divisible by 16. 
[out]  info  INTEGER

magma_int_t magma_zpotrf_m  (  magma_int_t  ngpu, 
magma_uplo_t  uplo,  
magma_int_t  n,  
magmaDoubleComplex *  A,  
magma_int_t  lda,  
magma_int_t *  info  
) 
ZPOTRF computes the Cholesky factorization of a complex Hermitian positive definite matrix A.
This version does not require work space on the GPU passed as input. GPU memory is allocated in the routine. The matrix A may exceed the GPU memory.
The factorization has the form A = U**H * U, if UPLO = MagmaUpper, or A = L * L**H, if UPLO = MagmaLower, where U is an upper triangular matrix and L is lower triangular.
This is the block version of the algorithm, calling Level 3 BLAS.
[in]  ngpu  INTEGER Number of GPUs to use. ngpu > 0. 
[in]  uplo  magma_uplo_t

[in]  n  INTEGER The order of the matrix A. N >= 0. 
[in,out]  A  COMPLEX_16 array, dimension (LDA,N) On entry, the symmetric matrix A. If UPLO = MagmaUpper, the leading NbyN upper triangular part of A contains the upper triangular part of the matrix A, and the strictly lower triangular part of A is not referenced. If UPLO = MagmaLower, the leading NbyN lower triangular part of A contains the lower triangular part of the matrix A, and the strictly upper triangular part of A is not referenced. On exit, if INFO = 0, the factor U or L from the Cholesky factorization A = U**H * U or A = L * L**H. Higher performance is achieved if A is in pinned memory, e.g. allocated using magma_malloc_pinned. 
[in]  lda  INTEGER The leading dimension of the array A. LDA >= max(1,N). 
[out]  info  INTEGER

magma_int_t magma_zpotrf_mgpu  (  magma_int_t  ngpu, 
magma_uplo_t  uplo,  
magma_int_t  n,  
magmaDoubleComplex_ptr  d_lA[],  
magma_int_t  ldda,  
magma_int_t *  info  
) 
ZPOTRF computes the Cholesky factorization of a complex Hermitian positive definite matrix dA.
The factorization has the form dA = U**H * U, if UPLO = MagmaUpper, or dA = L * L**H, if UPLO = MagmaLower, where U is an upper triangular matrix and L is lower triangular.
This is the block version of the algorithm, calling Level 3 BLAS.
[in]  ngpu  INTEGER Number of GPUs to use. ngpu > 0. 
[in]  uplo  magma_uplo_t

[in]  n  INTEGER The order of the matrix dA. N >= 0. 
[in,out]  d_lA  COMPLEX_16 array of pointers on the GPU, dimension (ngpu) On entry, the Hermitian matrix dA distributed over GPUs (d_lA[d] points to the local matrix on the dth GPU). It is distributed in 1D block column or row cyclic (with the block size of nb) if UPLO = MagmaUpper or MagmaLower, respectively. If UPLO = MagmaUpper, the leading NbyN upper triangular part of dA contains the upper triangular part of the matrix dA, and the strictly lower triangular part of dA is not referenced. If UPLO = MagmaLower, the leading NbyN lower triangular part of dA contains the lower triangular part of the matrix dA, and the strictly upper triangular part of dA is not referenced. On exit, if INFO = 0, the factor U or L from the Cholesky factorization dA = U**H * U or dA = L * L**H. 
[in]  ldda  INTEGER The leading dimension of the array d_lA. LDDA >= max(1,N). To benefit from coalescent memory accesses LDDA must be divisible by 16. 
[out]  info  INTEGER
