For solving a problem of the form: AX=B
There are two options:
magma_int_t magma_dgesv (magma_int_t n, magma_int_t nrhs, double *A, magma_int_t lda, magma_int_t *ipiv, double *B, magma_int_t ldb, magma_int_t *info)
magma_int_t magma_dgesv_gpu (magma_int_t n, magma_int_t nrhs, magmaDouble_ptr dA, magma_int_t ldda, magma_int_t *ipiv, magmaDouble_ptr dB, magma_int_t lddb, magma_int_t *info)
With the latter one (magma_dgesv_gpu), I can use pinned CPU memory and async copy.
I wanted to know if there is an analogous version for the subroutine: magma_dsygvdx_m. (magma_dsygvdx_m_gpu?)
Where the magma_dsygvdx_m is called this way:
magma_dsygvdx_m (magma_int_t ngpu, magma_int_t itype, magma_vec_t jobz, magma_range_t range, magma_uplo_t uplo, magma_int_t n, double *A, magma_int_t lda, double *B, magma_int_t ldb, double vl, double vu, magma_int_t il, magma_int_t iu, magma_int_t *m, double *w, double *work, magma_int_t lwork, magma_int_t *iwork, magma_int_t liwork, magma_int_t *info)
My ultimate purpose is to reduce data transfer and improve efficiency.
Pinned memory for diagonalization dsygvd (Divide and conquer)
Re: Pinned memory for diagonalization dsygvd (Divide and conquer)
Because of the complexity of managing an array distributed across multiple GPUs, we don't currently have a version of sygvdx_m where the matrix is given on the GPUs (i.e., sygvdx_mgpu).
I do recommend trying the 2-stage version, dsygvdx_2stage_m, which is often faster.
-mark
I do recommend trying the 2-stage version, dsygvdx_2stage_m, which is often faster.
-mark