sgetrf_gpu calculate partial on cpu! why?

Open discussion for MAGMA

sgetrf_gpu calculate partial on cpu! why?

Postby tomac » Wed Aug 03, 2011 7:01 am

I've noticed that the factorization (sgetrf_gpu) always does some smal computation on the CPU. why is that? the synchronization, get-setMatrix and re-start of the kernel takes a lot of time.
is there the posibility to turn it off, or do this part also on gpu?

Edit: ahhhh finally found your Paper "Dense Linear Algebra Solvers for..."
Now I think i understand, it´s necessary for pivoting?! is it possible to deaktivate pivote, idont need it on an Laplace.
and why is the cpu doing the factorization for panels

Thanks tomac
tomac
 
Posts: 7
Joined: Wed Jan 26, 2011 5:06 am

Re: sgetrf_gpu calculate partial on cpu! why?

Postby Stan Tomov » Thu Aug 11, 2011 4:45 pm

Hi Tomac,
It is possible to deactivate the pivoting and the code would be faster. We will add it to the release.
In general the panels are difficult to parallelize and would not run on the GPU as efficiently as Level 3 BLAS. Therefore we schedule/execute them on the CPU. We manage to overlap (for N big enough) the CPU work with updates on the GPU. As a result, the algorithm runs as fast as fast we can do the Level 3 BLAS (needed for the algorithm) on the GPU.
Stan
Stan Tomov
 
Posts: 250
Joined: Fri Aug 21, 2009 10:39 pm

Re: sgetrf_gpu calculate partial on cpu! why?

Postby tomac » Tue Aug 16, 2011 11:15 am

Hi Stan, first thanks for reply, its really helpful for understanding.
When is the next release day.
Is it possible to get an prepatch or trunk version or something like this.
Thanks again
Tomac
tomac
 
Posts: 7
Joined: Wed Jan 26, 2011 5:06 am

Re: sgetrf_gpu calculate partial on cpu! why?

Postby tomac » Fri Sep 09, 2011 7:32 am

Thanks for the nopiv versions

nevertheless Fortran beats GPU

I have done some tests with sgetrf on cpu and gpu
First i thought the gpu results are bad but then I calculate the flops and recognized i got impossible results for cpu variant.
My question is, how is that possible. The result matrices are correct and i take the time at the right place.
5046x5046 i got factorization on cpu with fortran sgetrf = 0.25 s ~ 514 Gflops
omg I have a little supercomputer under my desk.
What is going on? Is the dense Matrix transformed to an sparse? I have mainly zero entries in my matrix, because i convertet a sparse into a dense.
I substract 1 from every entrie in my matrix, and see the results are possible.
factorization on cpu with fortran sgetrf = 18.79 s  ~ 6,8 Gflops
That means Fortran really recognize zeros, but the gpu version doesn't.
tomac
 
Posts: 7
Joined: Wed Jan 26, 2011 5:06 am


Return to User discussion

Who is online

Users browsing this forum: Majestic-12 [Bot] and 4 guests