The task I'm trying to perform involves using OpenMP to launch multiple instances of PLASMA with heterogeneous initializations - i.e. multiple instances of PLASMA with different numbers of worker threads each. In our testing code, we're using two OpenMP threads to launch two instances of PLASMA. We observe that both instances of PLASMA do indeed run concurrently, however the threads generated by each PLASMA are bound to the same cores (well, starting on the same core). So for an example, on the first OpenMP thread PLASMA is initialized with 4 cores (PLASMA_INIT(4,plas_info)) and on the second OpenMP thread PLASMA is initialized with 2 cores (PLASMA_INIT(2,plas_info)). This results in a concurrent initialization, allocation and execution of plasma with no errors, however we see that the 4 threads of the first OMP thread are bound to cores 0, 1, 2, 3 and that the 2 cores of the second OMP thread are bound to cores 0 and 1. It seems that each instance of PLASMA respects it's own allocation of cores and does not disrupt the OpenMP threads, but that it always binds threads to cores starting from core 0, so that in this case there is an overlap.
Inspection of the PLASMA source seems to indicate that a call to PLASMA_INIT_AFFINITY gives one the option to specify an array specifying the cores to which to bind the worker threads. Indeed, writing a small hack to allow usage of this routine in FORTRAN allows for us to specify which cores the workers are bound to, thus relieving this problem. Obviously it is unfortunate to have to perform this little hack to get the library to behave how we want, but it seems odd that the function is available in C however that there is no corresponding FORTRAN interface. Also it is clear that our hap-hazard core binding will not be optimal for execution as we are not considering the physical location of the cores. Are there plans to extend such functionality to the FORTRAN API?