### ATLAS, BLAS, GotoBLAS No Speed Ups

Posted:

**Thu Jun 16, 2011 9:35 am**I installed GotoBlas2 and I have Intel Fortran with MKL installed.

I am trying to speed-up a code which basically consists of a large number of matrix vector multiplications and dot products

I devised a simple code which will multiply a random 15000x15000 matrix with a 15000x1 vector (Both NON Zero/ Non Sparse) which goes like this

where

I am using the commands as follows (WITHOUT CHANGING THE CODE AT ALL) :

gfortran matrix_multi.f90 -lblas && ./a.out (Clean i.e without openmp parallellization)

gfortran matrix_multi.f90 -lblas -fopenmp && ./a.out

gfortran matrix_multi.f90 -lf77blas -latlas && ./a.out

gfortran matrix_multi.f90 -lgoto2 -lpthread && ./a.out

gfortran m.f90 -L$MKLROOT/lib/ia32 -lmkl_blas95 -Wl,--start-group -lmkl_gf -lmkl_gnu_thread -lmkl_core -Wl,--end-group -lpthread -m32 -fopenmp (from linkline advisor)

But for some reason, the intrinsic MATMUL is almost always faster than all others.

Even if it isn't the fastest, it is VERY close to the fastest version.

Only in the case of mkl compiler is the DGEMV ALWAYS faster than the others.

Am I doing something wrong ? I am completely confused.

I don't know anything about installation and stuff and was barely able to install the files from synaptic and link them.

I am a complete N00B

I have already seen all links including one where someone is comparing lapack with mathematica and matlab but none seem to answer my questions

Thanks

I am trying to speed-up a code which basically consists of a large number of matrix vector multiplications and dot products

I devised a simple code which will multiply a random 15000x15000 matrix with a 15000x1 vector (Both NON Zero/ Non Sparse) which goes like this

- Code: Select all
`call system_clock(t1)`

b = matmul(a,x)

call system_clock(t2)

print*,"MATMUL :",(t2-t1)

call system_clock(t1)

b = matmult(n,a,x)

call system_clock(t2)

print*,"MATMULT :",(t2-t1)

call system_clock(t1)

call dgemv ( 'N', n, n,1.0D+00, a, n, x, 1, 1.0D+00, b, 1 )

call system_clock(t2)

print*,"DGEMV :",(t2-t1)

where

- Code: Select all
`FUNCTION MATMULT(dim,a,x)`

INTEGER :: dim

double precision, DIMENSION(dim,dim) :: a

double precision, DIMENSION(dim) :: x

INTEGER ::i,j

double precision, DIMENSION(dim) :: MATMULT

!$omp parallel

!$omp workshare

MATMULT(1:n) = matmul( a(1:n,1:n), x(1:n) )

!$omp end workshare

!$omp end parallel

END FUNCTION MATMULT

I am using the commands as follows (WITHOUT CHANGING THE CODE AT ALL) :

gfortran matrix_multi.f90 -lblas && ./a.out (Clean i.e without openmp parallellization)

gfortran matrix_multi.f90 -lblas -fopenmp && ./a.out

gfortran matrix_multi.f90 -lf77blas -latlas && ./a.out

gfortran matrix_multi.f90 -lgoto2 -lpthread && ./a.out

gfortran m.f90 -L$MKLROOT/lib/ia32 -lmkl_blas95 -Wl,--start-group -lmkl_gf -lmkl_gnu_thread -lmkl_core -Wl,--end-group -lpthread -m32 -fopenmp (from linkline advisor)

But for some reason, the intrinsic MATMUL is almost always faster than all others.

Even if it isn't the fastest, it is VERY close to the fastest version.

Only in the case of mkl compiler is the DGEMV ALWAYS faster than the others.

Am I doing something wrong ? I am completely confused.

I don't know anything about installation and stuff and was barely able to install the files from synaptic and link them.

I am a complete N00B

I have already seen all links including one where someone is comparing lapack with mathematica and matlab but none seem to answer my questions

Thanks