## BLAS Issue: cblas_sgemm time dependent on matrix values

Open discussion regarding features, bugs, issues, vendors, etc.

### BLAS Issue: cblas_sgemm time dependent on matrix values

What could be the reason behind a cblas_sgemm call taking much less time for matrices with a large number of zeros as compared to the same cblas_sgemm call for dense matrices?

I know gemv is designed for matrix-vector multiplication but why can't I use gemm for vector-matrix multiplication if it takes less time, especially for sparse matrices

A short representative code is given below. It asks to enter a value, and then populates a vector with that value. It then replaces every 32nd value with its index. So, if we enter '0' then we get a sparse vector but for other values we get a dense vector.

Code: Select all
#include <iostream>
#include <stdio.h>
#include <time.h>
#include <cblas.h>

using namespace std;

int main()
{
const int m = 5000;

timespec blas_start, blas_end;
long totalnsec; //total nano sec
double totalsec, totaltime;
int i, j;
float *A = new float[m]; // 1 x m
float *B = new float[m*m]; // m x m
float *C = new float[m]; // 1 x m

float input;
cout << "Enter a value to populate the vector (0 for sparse) ";
cin >> input; // enter 0 for sparse

// input martix A: every 32nd element is non-zero, rest of the values = input
for(i = 0; i < m; i++)
{
A[i] = input;
if( i % 32 == 0)    //adjust for sparsity
A[i] = i;
}

// input matrix B: identity matrix
for(i = 0; i < m; i++)
for(j = 0; j < m; j++)
B[i*m + j] = (i==j);

clock_gettime(CLOCK_REALTIME, &blas_start);
cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, 1, m, m, 1.0f, A, m, B, m, 0.0f, C, m);
clock_gettime(CLOCK_REALTIME, &blas_end);

/* for(i = 0; i < m; i++)
printf("%f ", C[i]);
printf("\n\n");    */

// Print time
totalsec = (double)blas_end.tv_sec - (double)blas_start.tv_sec;
totalnsec = blas_end.tv_nsec - blas_start.tv_nsec;
if(totalnsec < 0)
{
totalnsec += 1e9;
totalsec -= 1;
}
totaltime = totalsec + (double)totalnsec*1e-9;
cout<<"Duration = "<< totaltime << "\n";

return 0;
}

I run it as follows in Ubuntu 14.04 with blas 3.0
erisp@ubuntu:~/uas/stackoverflow\$ g++ gemmcomp.cpp -o gemmcomp.o -lblas
erisp@ubuntu:~/uas/stackoverflow\$ ./gemmcomp.o
Enter a value to populate the vector (0 for sparse) 5
Duration = 0.0291558
erisp@ubuntu:~/uas/stackoverflow\$ ./gemmcomp.o
Enter a value to populate the vector (0 for sparse) 0
Duration = 0.000959521

So, it is clear that it takes much less time for sparse matrices.

The results of following commands might also be useful

erisp@ubuntu:~/uas/stackoverflow\$ ll -d /usr/lib/libblas* /etc/alternatives/libblas.*
lrwxrwxrwx 1 root root 26 مارچ 13 2015 /etc/alternatives/libblas.a -> /usr/lib/libblas/libblas.a
lrwxrwxrwx 1 root root 27 مارچ 13 2015 /etc/alternatives/libblas.so -> /usr/lib/libblas/libblas.so
lrwxrwxrwx 1 root root 29 مارچ 13 2015 /etc/alternatives/libblas.so.3 -> /usr/lib/libblas/libblas.so.3
lrwxrwxrwx 1 root root 29 مارچ 13 2015 /etc/alternatives/libblas.so.3gf -> /usr/lib/libblas/libblas.so.3
drwxr-xr-x 2 root root 4096 مارچ 13 2015 /usr/lib/libblas/
lrwxrwxrwx 1 root root 27 مارچ 13 2015 /usr/lib/libblas.a -> /etc/alternatives/libblas.a
lrwxrwxrwx 1 root root 28 مارچ 13 2015 /usr/lib/libblas.so -> /etc/alternatives/libblas.so
lrwxrwxrwx 1 root root 30 مارچ 13 2015 /usr/lib/libblas.so.3 -> /etc/alternatives/libblas.so.3
lrwxrwxrwx 1 root root 32 مارچ 13 2015 /usr/lib/libblas.so.3gf -> /etc/alternatives/libblas.so.3gf

erisp@ubuntu:~/uas/stackoverflow\$ ldd ./gemmcomp.o
linux-gate.so.1 => (0xb76f6000)
libblas.so.3 => /usr/lib/libblas.so.3 (0xb765e000)
libstdc++.so.6 => /usr/lib/i386-linux-gnu/libstdc++.so.6 (0xb7576000)
libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xb73c7000)
libm.so.6 => /lib/i386-linux-gnu/libm.so.6 (0xb7381000)
/lib/ld-linux.so.2 (0xb76f7000)
libgcc_s.so.1 => /lib/i386-linux-gnu/libgcc_s.so.1 (0xb7364000)

malang

Posts: 1
Joined: Mon Jun 27, 2016 12:13 pm

### Re: BLAS Issue: cblas_sgemm time dependent on matrix values

I executed your code on laptop MacBook Pro with the Accelerate framework (for the BLAS) and I do not see any time difference whether I enter `0` or anything else. Maybe some other users could try on other architectures and report. It is not expected that BLAS takes into account a potential sparsity to speed-up the computation. Julien.
Julien Langou

Posts: 835
Joined: Thu Dec 09, 2004 12:32 pm
Location: Denver, CO, USA

### Re: BLAS Issue: cblas_sgemm time dependent on matrix values

I tested the code on my laptop (with rather slow memory), Ubuntu 16.04.
1. Test on reference BLAS (from LAPACK 3.6.1) - both "sparse" and dense vectors resulted in similar timings: 0.041 - 0.042 sec.
2. Test on ATLAS BLAS - both "sparse" and dense vectors resulted in similar timings: 0.071 - 0.076 sec.

(Off-topic: it's weird ATLAS was slower than the reference BLAS here; but in other circumstances ATLAS is faster, e.g. on my laptop DGESV works 3-4 times faster with ATLAS than with ref-BLAS, I tested for matrices of sizes 1000 - 6000).
ilya-fursov

Posts: 6
Joined: Thu Nov 10, 2016 9:00 am