About time required for launching MAGMA kernels in host

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)
Post Reply
Posts: 2
Joined: Mon Dec 03, 2018 4:08 am

About time required for launching MAGMA kernels in host

Post by lakenear » Mon Dec 03, 2018 4:16 am

Hello everyone.

I am using MAGMA library, magma_dgels_gpu in more details, for obtaining least square solutions of an equation, 'Ax=b'.

During a profiling process of my algorithm, I have found that there is unignorable latency for launching MAGMA kernel, magm_dgels_gpu as shown in the attached file.

With regard to the mentioned issue above, I came up with two questions and hoped if I can have some advice thankfully.

1. Would there be any way of reducing the time for launching MAGMA kernel?

2. Can MAGMA kernel be called in device kernels by means of 'dynamic parallelism'?

Thank you for your help and time.

Best regards.
Inkednvvp capture_LI.jpg
Capture for NVVP profiling result
Inkednvvp capture_LI.jpg (262.04 KiB) Viewed 99 times

Posts: 842
Joined: Fri Jan 06, 2012 2:13 pm

Re: About time required for launching MAGMA kernels in host

Post by mgates3 » Mon Dec 03, 2018 10:11 am

It's unclear what is occurring in your example. A sample run using one of the MAGMA testers, or at least sample code, so we can attempt to reproduce the issue is needed. What is the problem size? What is your system — OS, CUDA version, what CPU & GPU?

Also, it would help to expanding the "[+] Compute" section to show what kernels are actually getting launched. Is there more to the computation outside of the window shown? It's unclear in your profile when the MAGMA dgels routine is actually working.

There is no particular "magma_dgels_gpu" kernel. It is a hybrid code that launches many cuBLAS kernels (cuBLAS dgemm, etc.). That is, magma_dgels_gpu itself runs on the CPU, while it launches kernels on the GPU.


Post Reply