magmablas gemm

Open discussion for MAGMA

magmablas gemm

Postby sachinfarfade » Wed Jun 09, 2010 6:17 am

Hello,
I am doing benchmarking of gemm implementations of magmablas and CUBLAS. I have some queries about using various versions of GEMM of magmablas and cublas.
1. None of the GEMM variants have "_gpu" postfix, does this mean that we can call these functions with input matrices in CPU memory. I have tried this but it didn't work. If it is possible, please help me with example program. I want you use bigger matrices as input (bigger than 6000x6000), but not able to use as limited by GPU memory.
2. Test of magamblas_dgemm function failed on my machine. I am working on Quadro FX 3700 ( compute Capability 1.1), it doesn't support double-precision computation. Is GPU with double precision support necessary for magamblas_dgemm function to work properly.
3. Magmablas doesn't have GEMM with double complex precision. Is implementation of GEMM with double complex precision in roadmap of MAGMA. When can we expect it to be included ?
4. On my machine performance of SGEMM of MAGMABLAS is lower than CUBLAS. Is it possible. Output of testing_sgemm is pasted below.

[$] $ ./testing_sgemm
This is an Experimental Release of GEMM Routine without Padding

device 0: Quadro FX 3700, 1250.0 MHz clock, 511.3 MB memory

Usage:
./testing_sgemm N

N magmablas0.2 GFLops/s cudablas-2.3 GFlops/s error
=============================================================================
512 140.9850 144.0877 0.000000e+00
513 20.9750 34.5062 0.000000e+00
1024 128.5764 155.3446 0.000000e+00
1025 24.8981 40.1077 0.000000e+00
1536 130.4093 158.0858 0.000000e+00
1537 26.2437 40.9016 0.000000e+00
2048 131.0740 159.1819 0.000000e+00
2049 26.8537 41.3387 0.000000e+00
2560 130.9564 159.0815 0.000000e+00
2561 27.1933 41.5852 0.000000e+00
3072 131.0607 159.2249 0.000000e+00
3073 27.5137 41.7357 0.000000e+00
3584 130.4331 158.6985 0.000000e+00
3585 27.7171 41.7394 0.000000e+00
4096 129.8808 158.3672 0.000000e+00
4097 27.8780 41.3897 0.000000e+00
4608 129.3252 158.2941 0.000000e+00
4609 27.9953 41.5576 0.000000e+00
5120 129.4471 158.2078 0.000000e+00
5121 28.0693 41.3611 0.000000e+00


Regards
sachinfarfade
 
Posts: 2
Joined: Wed Jun 09, 2010 4:50 am

Re: magmablas gemm

Postby rnath » Wed Jun 09, 2010 11:56 am

sachinfarfade wrote:Hello,
I am doing benchmarking of gemm implementations of magmablas and CUBLAS. I have some queries about using various versions of GEMM of magmablas and cublas.
1. None of the GEMM variants have "_gpu" postfix, does this mean that we can call these functions with input matrices in CPU memory. I have tried this but it didn't work. If it is possible, please help me with example program. I want you use bigger matrices as input (bigger than 6000x6000), but not able to use as limited by GPU memory.

The data should be in the GPU. These are internal BLAS routines that were optimized for higher level MAGMA routines. It was provided with the release as someone might find it useful.
You may find some out of GPU GEMM online.
Please check this out
http://www.chem-quantum.info/scigpu/

sachinfarfade wrote:2. Test of magamblas_dgemm function failed on my machine. I am working on Quadro FX 3700 ( compute Capability 1.1), it doesn't support double-precision computation. Is GPU with double precision support necessary for magamblas_dgemm function to work properly.

Yes. ( if the compiler isn't supporting this by doing something fancy)
sachinfarfade wrote:3. Magmablas doesn't have GEMM with double complex precision. Is implementation of GEMM with double complex precision in roadmap of MAGMA. When can we expect it to be included ?

We don't want to maintain all the BLAS for all the GPUs. Usually we expect NVIDIA to provide highly optimized BLAS kernels. But for our need we develop different kernels and try to provide with the release ( as those are really useful). That's what happened with the last release. Some of the interesting kernel might be available in the new release ( but not for all architecture).

sachinfarfade wrote:4. On my machine performance of SGEMM of MAGMABLAS is lower than CUBLAS. Is it possible. Output of testing_sgemm is pasted below.

It might be possible. As MAGMA's BLAS kernels were tuned for two powerful+widely used card GTX280 and Tesla C1060. The next release will be tuned for GTX480 + GTX280 + Tesla 1060 , 2060.
sachinfarfade wrote:[$] $ ./testing_sgemm
This is an Experimental Release of GEMM Routine without Padding

device 0: Quadro FX 3700, 1250.0 MHz clock, 511.3 MB memory

Usage:
./testing_sgemm N

N magmablas0.2 GFLops/s cudablas-2.3 GFlops/s error
=============================================================================
512 140.9850 144.0877 0.000000e+00
513 20.9750 34.5062 0.000000e+00
1024 128.5764 155.3446 0.000000e+00
1025 24.8981 40.1077 0.000000e+00
1536 130.4093 158.0858 0.000000e+00
1537 26.2437 40.9016 0.000000e+00
2048 131.0740 159.1819 0.000000e+00
2049 26.8537 41.3387 0.000000e+00
2560 130.9564 159.0815 0.000000e+00
2561 27.1933 41.5852 0.000000e+00
3072 131.0607 159.2249 0.000000e+00
3073 27.5137 41.7357 0.000000e+00
3584 130.4331 158.6985 0.000000e+00
3585 27.7171 41.7394 0.000000e+00
4096 129.8808 158.3672 0.000000e+00
4097 27.8780 41.3897 0.000000e+00
4608 129.3252 158.2941 0.000000e+00
4609 27.9953 41.5576 0.000000e+00
5120 129.4471 158.2078 0.000000e+00
5121 28.0693 41.3611 0.000000e+00


Regards
rnath
 
Posts: 10
Joined: Sat Nov 21, 2009 5:32 pm

Re: magmablas gemm

Postby sachinfarfade » Mon Jun 21, 2010 9:04 am

Hi,
Did anyone use SCIGPU blas library available at http://www.chem-quantum.info/scigpu/ ?
I am interested in the library as it provides Matrix multiplication with inputs in CPU memory.
As a result matrix with very large size can also be used.
I could compile SCIGPU test code but output of test code is not matching with reference
implementation. Did anyone face similar problem?
Any help in this regard will be appreciated.

Thanks in Advance.
sachinfarfade
 
Posts: 2
Joined: Wed Jun 09, 2010 4:50 am


Return to User discussion

Who is online

Users browsing this forum: Bing [Bot], Google [Bot] and 3 guests