SVD on Magma

Open discussion for MAGMA

SVD on Magma

Postby sunils » Mon Jan 17, 2011 6:48 pm

Hello,

Is there a date for a release of Magma that does SVD computations? Please let me know.

If it is in Beta stage and you need a tester, I will be glad to help too.

Thanks and kind regards,
Linus
sunils
 
Posts: 4
Joined: Mon Jan 17, 2011 6:44 pm

Re: SVD on Magma

Postby sunils » Fri Jan 21, 2011 12:38 am

How come no one has replied?
I'm surprised even the admin/product owner has no comments.

:o
sunils
 
Posts: 4
Joined: Mon Jan 17, 2011 6:44 pm

Re: SVD on Magma

Postby Stan Tomov » Fri Jan 21, 2011 2:46 am

Hello,
We are preparing it and will release it in MAGMA 1.0 RC3 tomorrow.
We would be happy to get help with the testing as we prepare the final
MAGMA 1.0 release. Thank you for offering it.
Best regards,
Stan
Stan Tomov
 
Posts: 249
Joined: Fri Aug 21, 2009 10:39 pm

Re: SVD on Magma

Postby sunils » Fri Jan 21, 2011 3:00 am

Thank you Stan. I appreciate your response.
I will look for the download on the site and try it out this weekend.
Do let me know if you need me to test something specific.
sunils
 
Posts: 4
Joined: Mon Jan 17, 2011 6:44 pm

Re: SVD on Magma

Postby fletchjp » Mon Jan 24, 2011 4:52 pm

Please could we have some test programs for the SVD routines.

John
fletchjp
 
Posts: 170
Joined: Mon Dec 27, 2010 7:29 pm

Re: SVD on Magma

Postby Stan Tomov » Tue Jan 25, 2011 2:14 pm

We have to do that. Outside contributions are also always welcome!

For now we are concentrated on testing the main components that eigen- and SVD-solvers would use. The SVD currently in MAGMA 1.0 is the LAPACK SVD where certain components are GPU-accelearated, and the ones that are not, are done on the multicore host. There are several optimizations that can be done on what is released so far. An important one (for the basic algorithms released) would be optimizations on Level 2 BLAS.

I am curious if soliciting/challenging the community to produce fast Level 2 BLAS would result in finding near optimal algorithms. In particular, gemv was running near optimal speed on the older generation of GPUs. For example, the GTX280 has a bandwidth of 140 GB/s so optimal sgemv would be 70 GFlop/s and MAGMA achieves 65 GFlop/s. ssymv has to be theoretically up to 140 GFlop/s. We have seen algorithms achieving a little more that 100 GFlop/s. The question is can we extend these results (and possibly improve them) on the Fermi architecture? The problem here is that the ECC seems to have too large performance hit on memory bound computations (and as the bandwidth was not increased in Fermi, Level 2 BLAS on Fermi are slower than on the older Tesla architectures).

Stan
Stan Tomov
 
Posts: 249
Joined: Fri Aug 21, 2009 10:39 pm

Re: SVD on Magma

Postby fletchjp » Thu Feb 10, 2011 8:37 pm

Stan

I have just started some testing on the zgesvd example. It looks as though quite a lot of work is needed, as it does not seem any faster on the GPU than on the CPU, and something is making it mainly run on a single thread on the CPU most of the time.
I am running single cases, as the whole run crashes with a seg fault and in any case the larger cases are very slow.

I will report some numbers.

John

I have now done some testing - for square matrices each doubling takes about 10 times as long and there is little difference between the CPU and GPU times.

Square matrices
M=N CPU GPU (seconds)
512 2.35 2.44
1000 18.40 18.49
1024 23.65 23.62
2048 257.25 254.97
4096 2858.04 2832.85

I have also done some work on rectangular matrices. I changed the setting in the calls from A,A to S,S to only calculate the relevant rectangles. There are some cases where the work size is not adequate for the value returned for maxwork, and increasing the size makes a big difference to the timings. There is little difference between the CPU and GPU timings in most cases. However, there is one case, M = 1000, N = 1024 where the CPU case gives a result but the GPU case fails with INFO=999.

Experiments show that the failing case is called Path 10t: N > M but not much larger in zgesvd.cpp.

Maximum worksize reported:

GPU
M N-> 512 1000 1024 2048 4096 8192 10240
512 33792 295936 295936 295936 295936 295936 295936
1000 295936 66000 66768 1066000 1066000 1066000 1066000
1024 295936 66768 67584 1116160 1116160 1116160
2048 295936 1066000 1116160 135168 4329472
4096 295936 1066000 4329472
8192 295936 1066000
10240 295936

Note that the N by N memory sizes reported are much smaller than for other sizes, where the size only depends on the smaller dimension. When the small size is 512 the cases go on being very quick. It may be that I am just reporting known things about the CPU algorithm. I hope it will help you understand what is going on.

GPU times in seconds

GPU
M N-> 512 1000 1024 2048 4096 8192 10240
512 2.57 3.57 3.65 4.26 5.14 7.12 7.99
1000 3.49 18.44 (999) 25.53 29.58 34.97 40.19
1024 3.50 19.77 23.73 60.27 70.11 79.85 81.19
2048 3.94 27.48 38.56 255.82 914.03
4096 4.73 30.42 42.89 483.05
8192 5.91 35.17 44.59
10240 6.77 36.35 47.82

Note some lack of symmetry with N>M slower than N<M for the same sizes. The 2048 by 4096 value may come down (it needed more workspace). Indeed it does, from 914.03 to 523.02 with enough workspace.

The failing case has N<M by a small value. I have not yet tried any other similar cases.

device 0: GeForce GTX 460, 1400.0 MHz clock, 2047.2 MB memory
fletchjp
 
Posts: 170
Joined: Mon Dec 27, 2010 7:29 pm

Re: SVD on Magma

Postby fletchjp » Tue Feb 15, 2011 6:19 pm

Stan

I have been doing some more testing and I think there are problems in the base CPU version. Some specific sizes e.g. 1024, 2048, 4096 seem to be slower than sizes around them. I also notice that although I am using a BLAS which supports 4 threads, for much of the time the CPU and GPU routines only use 1 thread from the monitoring of CPU activity. I am reluctant to get into the CPU code.

See my previous post for details of a case which fails on the GPU. I have overcome this by setting the limit (mnthr) so that it skips that case and treats all cases where N>M as large N>M. There is more irregularity about timings for these than for M>N.

Cheers

John
fletchjp
 
Posts: 170
Joined: Mon Dec 27, 2010 7:29 pm

Re: SVD on Magma

Postby fletchjp » Sat Feb 19, 2011 7:44 pm

Stan

I have been digging into ZGESVD to find out why it is so slow and most of the time only single threaded despite my use of GotoBLAS2. The answer as far as I can work it out is that it actually spends most of its time in ZBDSQR which I have eventually tracked down as being imlemented in FORTRAN as part of LAPACK 3.1.1 in GotoBLAS2. It looks as though that does not make use of BLAS level routines much, and clearly improvements to ZGESVD will come from work on ZBDSQR to either make it use the multithreaded BLAS or make some use of the GPU.

I will wait to see what differences you have in RC4. At the moment my testing of the existing routine will be a benchmark for any changes which there are.

Best wishes

John
fletchjp
 
Posts: 170
Joined: Mon Dec 27, 2010 7:29 pm

Re: SVD on Magma

Postby Stan Tomov » Tue Apr 12, 2011 11:38 pm

John,
The case n < m is now fixed in RC5. The problem was that the bidiagonalization was not implemented for the n>m case. Now it is added. Performance improvements will come as we add GPU acceleration to the other parts of the algorithm.
Stan
Stan Tomov
 
Posts: 249
Joined: Fri Aug 21, 2009 10:39 pm

Next

Return to User discussion

Who is online

Users browsing this forum: No registered users and 2 guests