CPU: clMagma vs Magma/Lapack

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)

CPU: clMagma vs Magma/Lapack

Postby fsergeyal » Mon Aug 22, 2016 10:05 am

Hi All,

I'm going to use magma in my research and have a question which me and google cannot to answer for.

Consider I have no external GPU, but have good CPU (Intel Skylake).

What have better performance: Magma/lapack with CPU or clMagma with opencl CPU?
For instance, for Matrix*Matrix multiplication task.

I suppose the answer is clMagma, but I cannot google any information how to compile clMagma for Intel opencl SDK for CPU.

I am first who try to use clMagma for Intel opencl CPU?))

Thanks, Sergey
Last edited by fsergeyal on Tue Aug 23, 2016 12:50 am, edited 1 time in total.
fsergeyal
 
Posts: 3
Joined: Mon Aug 22, 2016 9:56 am

Re: CPU: clMagma vs Magma/Lapack

Postby mgates3 » Mon Aug 22, 2016 9:01 pm

With no GPU, you can just use LAPACK. Vendor and open source libraries (such as MKL, ACML, OpenBLAS) include both LAPACK and BLAS, and are optimized for CPUs.

MAGMA and clMAGMA are both designed for use with an added GPU, where the CPU BLAS routines do not operate. Theoretically, it should be possible to treat the CPU as an OpenCL device and run clBLAS on the CPU, but it wouldn't be as optimized as the vendor BLAS, and would incur extra data copies (between the "host" CPU and the OpenCL "device" CPU).

-mark
mgates3
 
Posts: 738
Joined: Fri Jan 06, 2012 2:13 pm

Re: CPU: clMagma vs Magma/Lapack

Postby fsergeyal » Tue Aug 23, 2016 12:50 am

mgates3 wrote:With no GPU, you can just use LAPACK. Vendor and open source libraries (such as MKL, ACML, OpenBLAS) include both LAPACK and BLAS, and are optimized for CPUs.

MAGMA and clMAGMA are both designed for use with an added GPU, where the CPU BLAS routines do not operate. Theoretically, it should be possible to treat the CPU as an OpenCL device and run clBLAS on the CPU, but it wouldn't be as optimized as the vendor BLAS, and would incur extra data copies (between the "host" CPU and the OpenCL "device" CPU).

-mark


Thanks, Mark.

As I know, LAPACK does not use such CPU features as SSE, AVX, AVX2, FMA.
Also, data (between the "host" CPU and the OpenCL "device") is copied from DDR4 to same DDR4 and by bulk and not byte-by-byte - it should be very fast.

So, we have two scalepans here and I suppose clMagma on opencl CPU would be faster when you multiplying thousands 4000*4000 matrices.

Sergey
fsergeyal
 
Posts: 3
Joined: Mon Aug 22, 2016 9:56 am

Re: CPU: clMagma vs Magma/Lapack

Postby fsergeyal » Tue Aug 23, 2016 12:55 am

Well, it seems I was wrong

Intel MKL supports AVX2
https://software.intel.com/en-us/articl ... intel-avx2

So, I have no questions now)
fsergeyal
 
Posts: 3
Joined: Mon Aug 22, 2016 9:56 am

Re: CPU: clMagma vs Magma/Lapack

Postby mgates3 » Tue Aug 23, 2016 9:30 am

Yes, all the modern BLAS libraries (MKL, ACML, OpenBLAS, ATLAS) will use as much SSE/AVX/etc. as they can. LAPACK itself doesn't have explicit SSE/AVX/etc. calls, but relies on the optimized BLAS for the bulk of its computation.

All the above libraries are freely available. MKL now has a community license.
https://software.intel.com/en-us/articles/free-mkl

-mark
mgates3
 
Posts: 738
Joined: Fri Jan 06, 2012 2:13 pm


Return to User discussion

Who is online

Users browsing this forum: No registered users and 2 guests

cron