I am trying to speed-up a code which basically consists of a large number of matrix vector multiplications and dot products
I devised a simple code which will multiply a random 15000x15000 matrix with a 15000x1 vector (Both NON Zero/ Non Sparse) which goes like this
- Code: Select all
call system_clock(t1)
b = matmul(a,x)
call system_clock(t2)
print*,"MATMUL :",(t2-t1)
call system_clock(t1)
b = matmult(n,a,x)
call system_clock(t2)
print*,"MATMULT :",(t2-t1)
call system_clock(t1)
call dgemv ( 'N', n, n,1.0D+00, a, n, x, 1, 1.0D+00, b, 1 )
call system_clock(t2)
print*,"DGEMV :",(t2-t1)
where
- Code: Select all
FUNCTION MATMULT(dim,a,x)
INTEGER :: dim
double precision, DIMENSION(dim,dim) :: a
double precision, DIMENSION(dim) :: x
INTEGER ::i,j
double precision, DIMENSION(dim) :: MATMULT
!$omp parallel
!$omp workshare
MATMULT(1:n) = matmul( a(1:n,1:n), x(1:n) )
!$omp end workshare
!$omp end parallel
END FUNCTION MATMULT
I am using the commands as follows (WITHOUT CHANGING THE CODE AT ALL) :
gfortran matrix_multi.f90 -lblas && ./a.out (Clean i.e without openmp parallellization)
gfortran matrix_multi.f90 -lblas -fopenmp && ./a.out
gfortran matrix_multi.f90 -lf77blas -latlas && ./a.out
gfortran matrix_multi.f90 -lgoto2 -lpthread && ./a.out
gfortran m.f90 -L$MKLROOT/lib/ia32 -lmkl_blas95 -Wl,--start-group -lmkl_gf -lmkl_gnu_thread -lmkl_core -Wl,--end-group -lpthread -m32 -fopenmp (from linkline advisor)
But for some reason, the intrinsic MATMUL is almost always faster than all others.
Even if it isn't the fastest, it is VERY close to the fastest version.
Only in the case of mkl compiler is the DGEMV ALWAYS faster than the others.
Am I doing something wrong ? I am completely confused.
I don't know anything about installation and stuff and was barely able to install the files from synaptic and link them.
I am a complete N00B
I have already seen all links including one where someone is comparing lapack with mathematica and matlab but none seem to answer my questions
Thanks