Dear experts,

I am a new user of PLASMA (and linear algebra packages in general). I am trying to compute an inverse (and log(det)) of big positive definite matrices as fast as possible on one computer with (eventually multi) GPUs.

I compute 24576 X 24576 matrix using 16 6144 X 6144 tiles. I now use MKL to compute dpotrf, dtrsm, dgemm etc. of the 6144 X 6144 matrix. I would eventually send them off to GPUs using CUBLAS/CULA/MAGMA.

I am testing plasma_dpotrf_tile_async and plasma_dpotri_tile_async to do this. For now I want to use only one thread of plasma but want to use multi-core in mkl routines for testing my idea.

When I set to plasma cores to one, it seems mkl also do not use threads. (I observe it with top going only to 100%).

When I set to plasma cores to 6, during the first dpotrf call, top gets to 600% but when three trsm starts, it becomes 300%. I take each trsm is using 100% of a core.

I looked around the affinity code...

I wonder how I can control number of plasma threads and MKL threads independently and achieve what I want to do.

Thank you for help.

Best,

Nobu