Page 1 of 1


PostPosted: Fri Aug 05, 2016 3:24 pm
by V1cSt0ne
I'm trying to properly link the LAPACK routine to perform a matrix multiply, dgemm, but I don't think that I properly understand the relationship between BLAS and LAPACK. I used the command

sudo apt-get liblapack-dev

I got the program to compile, but I removed the -llapack flag and kept the -lblas flagto ensure that it wasn't using BLAS routine, and the program still compiled. I am not sure how dgemm relates to lapack; I know that it is a blas routine, but LAPACK should provide an optimized version of this I would think. I can't find it in the documentation, which makes me think that it is an auxiliary function if that. Can anyone offer any clarification on this?


PostPosted: Fri Aug 05, 2016 3:40 pm
by Julien Langou
DGEMM is in the BLAS. It is not in LAPACK. So, if your code only needs DGEMM, and you link with only BLAS, then all good indeed.

LAPACK needs BLAS. LAPACK does not provide BLAS. You have to add the external library BLAS to LAPACK for proper link.

BLAS does not need LAPACK. In BLAS is DGEMM.

LAPACK does not provide an optimized version of DGEMM. (Since it does not provide DGEMM to start with.)

LAPACK uses the fact that in optimized BLAS libraries (like ATLAS, vecLib, MKL, OpenBLAS, ACML, etc.) DGEMM is highly tuned and highly efficient. So the algorithm in LAPACK tries to call DGEMM on as large as possible matrices as many times as possible. This is how LAPACK is efficient. Without an optimized BLAS, most of LAPACK routines are not very efficient.

If you link with the blas that you got from `apt-get`, I do not think this BLAS is optimized. This is likely to be the reference BLAS which is provided with LAPACK (as a separate library but we provide it in the package).

Hope this makes some sense.


PostPosted: Sat Aug 06, 2016 8:26 am
by V1cSt0ne
Yeah, that helps a lot! As I see, there's no ARM implementation for the BLAS libraries on the netlib site, so, for an optimized library, I should check out OpenBLAS for a quick, more optimized implementation, and ATLAS for best case, assuming that I get the tuning options all correct.