testing_dorgqr_m api-trace

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)
Post Reply
Posts: 20
Joined: Thu Jul 03, 2014 6:06 am

testing_dorgqr_m api-trace

Post by roalmar2 » Fri Sep 26, 2014 7:10 am


I am looking for the time differences between local computation and remote one and I am searching inside the functions. (Other ideas will be welcomed)

Taking testing_dorgqr_m example, Time Measurement code is:

Code: Select all

            gpu_time = magma_wtime();
            magma_dorgqr_m( m, n, k, hR, lda, tau, hT, nb, &info );
            gpu_time = magma_wtime() - gpu_time;
            gpu_perf = gflops / gpu_time;
Before that, some operations like these one, take place:

Code: Select all

            magma_dsetmatrix( m, n, hA, lda, dA, ldda );
            magma_dgeqrf_gpu( m, n, dA, ldda, tau, dT, &info );
            magma_dgetmatrix( m, n, dA, ldda, hR, lda );
            magma_dgetmatrix( nb, min_mn, dT, nb, hT, nb );
Is possible to know which ones CUDA API they correspond to? Or some link with documentation?

Code: Select all

cudaLaunch (dgemm_sm35_ldg_tn_64x8x128x8x32 [765])
cudaLaunch (void trmm_right_kernel_core<double, int=256, int=4, int=128, bool=0, bool=1, bool=0, bool=0, bool=1>(cublasTrmmParams<double>, double, int) [774])
Thank you very much!!

Posts: 916
Joined: Fri Jan 06, 2012 2:13 pm

Re: testing_dorgqr_m api-trace

Post by mgates3 » Tue Sep 30, 2014 4:45 pm

dgeqrf will do a lot of different operations, including dgemm and trmm calls. If you are tracing the code, it might be easiest to use NVTX to add markers to the trace, so you are sure when events happen. See the CUDA Profiler Users Guide, chapter 6: NVIDIA tools extension.

Alternatively, you could replace magma_dgeqrf_gpu( ..., dA, ... ) with lapack_dgeqrf( ..., hA, ... ). It is just used to get a valid QR decomposition before running dorgqr.


Post Reply