### Multiple queues and sgemv_batched

Posted:

**Wed Jul 05, 2017 3:52 pm**Hello

I have a problem where i have to call 9 different sgemv_batched calls, on completely different data, save for the batch of A arrays, which is really the same matrix over and over. So i thought i could parallelize the bunch by creating 9 different queues and assigning each queue to one batched sgemv. However, the total time is still the sum of the times of each batch. Im using magma_v2, and declare the queues like so.

int device = 0;

magma_queue_t queue;

magma_queue_create(device, &queue);

So my question is : Is it impossible to cast all those batched sgemvs simulaneously, because of the function or something else I am unaware of, or am I making a mistake in my execution, (in which case i shall post my full code) ?

(Btw I built MAGMA with sequential mkl, not sure if that has anything to do with it)

A matrix is the same 128x128 matrix , the x and y are vectors of 128 components, and the batchCount is around 16000

Also, a slightly different question- do 3-4 milliseconds sound ok for each batch, on a GTX 970?

Any help would be greatly appreciated

Cheers

I have a problem where i have to call 9 different sgemv_batched calls, on completely different data, save for the batch of A arrays, which is really the same matrix over and over. So i thought i could parallelize the bunch by creating 9 different queues and assigning each queue to one batched sgemv. However, the total time is still the sum of the times of each batch. Im using magma_v2, and declare the queues like so.

int device = 0;

magma_queue_t queue;

magma_queue_create(device, &queue);

So my question is : Is it impossible to cast all those batched sgemvs simulaneously, because of the function or something else I am unaware of, or am I making a mistake in my execution, (in which case i shall post my full code) ?

(Btw I built MAGMA with sequential mkl, not sure if that has anything to do with it)

A matrix is the same 128x128 matrix , the x and y are vectors of 128 components, and the batchCount is around 16000

Also, a slightly different question- do 3-4 milliseconds sound ok for each batch, on a GTX 970?

Any help would be greatly appreciated

Cheers