## GPU interface for dsygvd

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)

### GPU interface for dsygvd

Hello,

Is there a gpu interface for magma_dsygvd function ?
The magma needs to be called for N times for M MD steps and the repeated cudamallocs and free are having an impact on the performance of our application.
Note: I am using MAGMA 1.1 on Keeneland.
kaushikragavan

Posts: 11
Joined: Thu May 31, 2012 1:50 pm

### Re: GPU interface for dsygvd

No, right now sygvd has only the CPU interface. If the routines took workspaces for both CPU and GPU, would that solve your problem as well?

Can I ask for a few specifics to get a handle on what you are trying to do?
What size matrix do you have? In double precision, I presume?

How many times do you repeat the sygvd call? I.e., approximate numbers for N and M that you listed.

Are you computing eigenvectors? Do you need all or only some (and how many) eigenvectors?

You said the malloc/free is impacting your performance. Can you quantify that in some sense, compared to the dsygvd time?

-mark
mgates3

Posts: 528
Joined: Fri Jan 06, 2012 2:13 pm

### Re: GPU interface for dsygvd

Dear Mark,

The application at present spends 70 % of the time on exact diagonalization and hence it will solve my problem if the magma routines can speedup it up using the heterogeneous workspaces.

I am trying stabilize a given atomic confg for a given MD step.

I have matrices ranging in order from 1000 to 4000 in double precision.

The sygvd will be repeated for > 10 times for 1 MD step and it needs to be done for 1000 MD steps.

Yes. I need eigenvectors at each step. I need the whole eigenvectors as a feedback for next MD step.

Compared to the MKL call, an order of few ms is spent on each MD step for the cudamallocs,memcpys and cudafree for the magma_dsygvd.

I am will try to convert the generalized symmetric-definite to standard eigenvalue problem and make use of the magma_dsyved gpu interface.
kaushikragavan

Posts: 11
Joined: Thu May 31, 2012 1:50 pm

### Re: GPU interface for dsygvd

It looks like the dsygvd code needs workspace for the matrix A on both the CPU and GPU. For matrix B, it looks like it just needs it on the GPU. As all the underlying routines called from dsygvd have GPU interfaces, it should be straight forward to implement a GPU interface. Basically add dA, ldda as arguments, and change arguments B, ldb (on CPU) to dB, lddb (on GPU). We'll keep it in mind for future releases, but if you need it now, hopefully that gives you some pointers about what to modify.

You said a few ms are spent each MD step on allocation and copying the matrix. How long does the dsygvd call itself take? That is, what % of time is allocating memory wasting? I would be a little surprised if allocation was a large overhead, but copying the A and B matrices from CPU to GPU might be expensive. However, unless you can generate the matrices on the GPU, or overlap the copy with other computation, you have to pay that cost sometime. (I assume this is just the alloc and copy in dsygvd itself. The underlying routines also do some alloc and copies, but smaller.)

-mark
mgates3

Posts: 528
Joined: Fri Jan 06, 2012 2:13 pm