Batched LU performance

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)

Batched LU performance

Postby maddyscientist » Tue Dec 15, 2015 9:03 pm

I'm using the batched LU routine in MAGMA, though performance is perhaps less than I would have expected. I have using 32x32 matrices, with a batch size of 25,000, and running on an M6000 I am getting <20 GFLOPS. This compares to matrix inversion (magma_cgetri_outofplace_batched) of 120 GFLOPS. Is this performance expected? For such a large batch size I would have expected better performance.

I also tried the no-pivot variant, which seems marginally faster (10%), though since there is not a no-pivot variant of batched cgetri so I can't use it anyway.

Posts: 1
Joined: Tue Dec 15, 2015 8:49 pm

Re: Batched LU performance

Postby haidar » Fri Feb 26, 2016 11:56 am

Sorry for the late answer delay.
what precision was for LU (cgetrf ?)
what is the peak of your machine for this kind of precision?
if you are really interested by this size I will check if we can provide you a specific version that is special designed to 32x32.
Posts: 19
Joined: Fri Sep 19, 2014 3:43 pm

Return to User discussion

Who is online

Users browsing this forum: No registered users and 1 guest