Thanks for a quick reply. Yes I hope to use plasma cholesky routines. Sorry for not explaining well.
I have been using multi threaded MKL to do the inversion of 24576X24576 matrix which takes quite some time, say 72 GFLPOS on a 6 core machine.
With single GPU board (C2050) with the CULA package, I can invert 12288 X 122288 matrix at around 190 GFLOPS. (unpacked 24576X24576 matrix does not fit on GPU local memory)
The matrix I would like to invert is 24576 X 24576 and eventually, 98304 X 98304. Thus I was hoping to use both CPUs and multi GPUs
on a same machine. 3 GPU boards are easy. I think with PLASMA, it is possible and I was testing it.
> Then you can use single-threaded PLASMA and multithreaded MKL. In principle it should work. If it does not, I am not sure why.
but I somehow can't make it work.
[katayama@btesla1 plasma]$ export OMP_NUM_THREADS=6
[katayama@btesla1 plasma]$ export MKL_NUM_THREADS=6
[katayama@btesla1 plasma]$ /usr/bin/nohup time ./plasma_dpotri --n_range=24576:24577:2 --
nb=6144 --threads=1 --dyn
only gets up to 100% with top. If I use --threads=6, as I said in the previous post, I get 600% during core_dpotrf but not after.
I would eventually like to control number of threads in plasma independently of OMP/MKL_NUM_THREADS as, using plasma threads,
I would like to call GPU gemm routines, for example while CPUs are doing, say, dtrtri.