today I looked more deeply into why the performance of 'optimized scalapack libraries' (libsci by Cray, libmkl by Intel) were yieding relatively poor performance of PDGEMM. It appears that they provide PILAENV unmodified, at least it returns a blocksize of 32. A link time hack that replaced their pilaenv with something that returned a larger value made a big difference in the pdgemm performance (like 2x). Wouldn't it be a good idea to provide in the netlib version of PBLAS something with a somewhat more optimal default ? I agree that is something for vendors to optimize, but the fact seems that they do not.