Page 1 of 1

*_gpu versions of the gesvd routines

Posted: Wed Jul 17, 2019 2:59 am
by gpoole
I have some code which is performing an svd calculation followed by a least squares calculation. I don't want to have to transfer the arrays twice. Is there some reason why the svd routines don't offer *_gpu versions so that I can call two functions sequentially, prefaced by just one device->gpu copy?

For context, the code I'm working with looks like this (in python, using scikit-cuda) ... I'm trying to reduce overheads on this and ultimately, I would like to use Cuda streams to hide the latency all together (since this code gets called repeatedly):

s = np.zeros(min(m, n), np.float32)
u = np.zeros((m, m), np.float32)
vh = np.zeros((n, n), np.float32)

# Compute singular-value decomposition
a1_wrk = np.copy(a1, order='F')
lwork = magma.magma_sgesvd_buffersize('A', 'A', m, n,, m,,, m,, n)
hwork = np.zeros(lwork, np.float32, order='F')
magma.magma_sgesvd('A', 'A', m, n,, m,,, m,, n,, lwork)

# Note, the use of i>10^-3 here; this is meant to select
# values that are effectively non-zero. Results will depend
# somewhat on the choice for this value
rank = sum(1 for i in s if i > 1e-3)

# Query and allocate optimal workspace
# n.b.: - the result for 'x' gets written to the input vector for b, so we just label b->x
# - assume magma variables lda=ldb=m throughout here
x = np.copy(b1, order='F')
lwork = magma.magma_sgels_buffersize('n', m, n, nrhs,, m,, m)
hwork = np.zeros(lwork, np.float32)

# Run solver
a1_wrk = np.copy(a1, order='F')
a1_gpu = gpuarray.to_gpu(a1_wrk)
x_gpu = gpuarray.to_gpu(x)
magma.magma_sgels_gpu('n', m, n, nrhs, a1_gpu.gpudata, m, x_gpu.gpudata, m,, lwork)
x = x_gpu.get()

Re: *_gpu versions of the gesvd routines

Posted: Wed Jul 17, 2019 11:04 am
by mgates3
No, there isn't any particular reason, svd_gpu routines just haven't been developed. Thanks for the request; we'll consider them for future developments.