LAPACK Archives

[Lapack] BLAS & CBLAS Subroutines with OpenMP on NEW multicore/thread CP

I've not been able to run my Fortran code with a large matrix solve 
since upgrading to a newer, more capable XEON CPU.  I'm finally 
contacting you after taking my issue to Intel (Fortran compiler gurus), 
but with no resolution. Here's the essence (transcript) of my exchange 
with Intel, including their latest response.

-- My intial email
My new workstation runs RH Release 6.6 with Linux kernel
2.6.32-504.el6.x86_64, with the newest Xeon CPU, plus 64 GB RAM. The Fortran 
executable was built using the 64 bit Intel
Fortran compiler with included MKL libs.  Nothing else is different, however, 
now my Fortran calls to
  the CBLAS lib (included in Intel MKL Libs) aren't maximizing core/thread 
usage at all when I call CGESV (matrix eqs solver).
In fact, it looks like core/thread-swapping has gone "nutz" as shown in the 
attached screen shot of my System Monitor. Prior to upgrading
to the more capable CPU (8cores/16threads), the previous CPU was an earlier 
version Xeon with fewer cores/threads (6cores/6threads) &
it achieved 100% utilization of all the cores/threads when I called the CGESV 
routine.  The speedup was beauiful, especially when I was
doing a 48,000 unknowns (complex) problem. Now, unless I set the number of 
threads to 1 in my execution script, it won't run to completion.
Have you seen this behavior (or complaints) with the newer 
multicore/multithread CPUs?

--  Intel response
The MKL threaded library defaults to setting 1 thread per physical
core.  If you run inside an omp parallel region with OMP_NESTED, this
would over-subscribe, as well as breaking the working of OMP_PROC_BIND
or KMP_AFFINITY.  So there are a lot of variables to consider.  If you
have MKL at single thread setting, called inside your omp parallel, your
  performance may still depend on setting affinity.  The MKL articles on
software.intel.com as well as the MKL forum may be useful references for
  you.


I don't have any OMP in my home-grown Fortran.  It's only inside the 
CBLAS CGESV routine I call...

   -- Vaughn



Core_swapping.png


-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.eecs.utk.edu/mailman/private/lapack/attachments/20150513/c20f16db/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 86469 bytes
Desc: not available
URL: 
<http://lists.eecs.utk.edu/mailman/private/lapack/attachments/20150513/c20f16db/attachment-0001.png>

<Prev in Thread] Current Thread [Next in Thread>
  • [Lapack] BLAS & CBLAS Subroutines with OpenMP on NEW multicore/thread CPUs, Vaughn Cable <=


For additional information you may use the LAPACK/ScaLAPACK Forum.
Or one of the mailing lists, or