Batched GEMV with float4

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)

Batched GEMV with float4

Postby Genji » Wed Jun 21, 2017 6:28 am

I want to use batched GEMV with a vector of float4 elements. However, I am ambivalent about how to proceed- do I break my float4 and do the gemv four times and then reassemble tem, or just write the size of float4 as inc and it will sort itself out?



Kind regards
Last edited by Genji on Tue Jul 04, 2017 6:28 am, edited 1 time in total.
Genji
 
Posts: 7
Joined: Mon May 29, 2017 8:58 pm

Re: Batched GEMV with float4

Postby haidar » Mon Jul 03, 2017 10:52 pm

Can you please elaborate in more detail on what you want to do?
Are you meaning the float4 of Cuda vector unit?

I think it might be easy to cast the type into float and use the single precision dgemv.
In term of performance, our GEMV routine reach the theoretical peak which is bandwidth/2 for single precision SGEMV and badnwidth/4 for double precision dgemv
Thanks
Azzam
haidar
 
Posts: 19
Joined: Fri Sep 19, 2014 3:43 pm

Re: Batched GEMV with float4

Postby Genji » Tue Jul 04, 2017 6:26 am

To clarify, do i break up my float4 into 4 gemvs OR use gemm with a 4 column matrix?
Genji
 
Posts: 7
Joined: Mon May 29, 2017 8:58 pm

Re: Batched GEMV with float4

Postby haidar » Tue Aug 01, 2017 9:13 pm

I think both should provide similar performance since a gemm with 4 columns will look like 4 gemv's.
This is considered to be memory bound operation and the performance of it will be behave like dgemv performance
Azzam
haidar
 
Posts: 19
Joined: Fri Sep 19, 2014 3:43 pm


Return to User discussion

Who is online

Users browsing this forum: No registered users and 4 guests

cron