Timer for MAGMA

Open discussion for MAGMA

Timer for MAGMA

Postby jgpallero » Fri Jan 04, 2013 7:54 am

Hello:

Can I use safety functions as gettimeofday() (POSIX) in order to conpute the execution time of MAGMA functions? I ask about because for example in OpenMP the omp_get_wtime() should be used in order to manage the possible various threads that can mislead other standard time functions.

Thanks
jgpallero
 
Posts: 29
Joined: Tue Nov 15, 2011 12:38 pm

Re: Timer for MAGMA

Postby mgates3 » Fri Jan 04, 2013 11:31 am

For most MAGMA functions, yes, you can use gettimeofday(). The exception is asynchronous routines, which require a cuda synchronize before & after for timing. All magmablas routines are asynchronous (they return immediately on the CPU, and run async on the GPU). The only other magma routine that I know is async is magma_zlarfb_gpu.
-mark
mgates3
 
Posts: 388
Joined: Fri Jan 06, 2012 2:13 pm

Re: Timer for MAGMA

Postby jgpallero » Sat Jan 05, 2013 9:16 am

Hello and thank you for your answer:

So if magmablas routines are asynchronous, how can I use them in normal programs? For example, imagine I need to multiply two matrices and I want to use magmablas_dgemm. The steps should be:

1. Initialize CUBLAS (cublasInit)
2. Allocate memory in the GPU
3. Copy the involved matrices into GPU
4. Call magmablas_dgemm
5. Get the results from GPU memory to main memory
6. Free the GPU resources
7. Close CUBLAS (cublasShutdown)

But if magmablas_dgemm is asynchronous after the call to it and before the operation is completed in the GPU the control returns to CPU and the program continues, but I need the results of magmablas_dgemm in order to cotinue. So I understand that the detailed way is not correct. How can I correctly use magmablas_dgemm?

Cheers
jgpallero
 
Posts: 29
Joined: Tue Nov 15, 2011 12:38 pm

Re: Timer for MAGMA

Postby mgates3 » Sun Jan 06, 2013 12:42 pm

The copy from GPU to main memory can be either sync or async, depending on what version of the routine you call. If sync, then you don't need to do anything special--the copy will execute after the gemm call and block the CPU until the copy finishes. If async, then use the CUDA stream synchronize function before using the result to ensure the copy has finished. All CUDA kernels are async, not just magma ones. This allows you to do useful work on the CPU while the GPU is busy.

If the only operation you do is a single gemm on the GPU, you may not see any performance improvement, as you have to pay data transfer times. Depends on the matrix size.

Finally, in general, I would recommend using cublas gemm, as nvidia continues to optimize it for new architectures. Magma really focuses on the higher level routines like getrf (LU factorization).
mgates3
 
Posts: 388
Joined: Fri Jan 06, 2012 2:13 pm

Re: Timer for MAGMA

Postby jgpallero » Sun Jan 06, 2013 7:35 pm

Hello again:

I use always (in CUBLAS and in MAGMA) the functions cublasGetMatrix and/or cublasGetVector. Are them synchronous or asynchronous? I've not found in the documentation.

So if I have:

cublasSetMatrix()....
status = cublasDgemm()...
cublasGetMatrix()...

I suppose this is safe, i. e. all operations in dgemm are completed after cublasGetMatrix() calls. As I use too cublasSet and cublasGet in MAGMA I suppose the same for magmablas_* routines.

And another question about CUBLAS (I know this is not the cublas list, sorry). If CUBLAS routines are al asynchronous, when I should check the output variable status? After the call to the CUBLAS routine or after the last cublasGet in order to wait for the end of the async CUBLAS routine? I've not found in CUBLAS documentation. In the documentation (of dgemm, for example) that the posible error status are about erroneous parameters (dimensions, etc.), CUDA not initialized, etc. I think this can be checked by the cublas function prior to return the control to the cpu, but I've not found information in the doc. How is checked the status od cublas routines in MAGMA?

Cheers
jgpallero
 
Posts: 29
Joined: Tue Nov 15, 2011 12:38 pm

Re: Timer for MAGMA

Postby mgates3 » Mon Jan 07, 2013 1:22 pm

cublas{Set,Get}Matrix are synchronous (which isn't stated explicitly in the documentation), while cublas{Set,Get}MatrixAsync are asynchronous, as documented in the CUBLAS guide.
-mark
mgates3
 
Posts: 388
Joined: Fri Jan 06, 2012 2:13 pm


Return to User discussion

Who is online

Users browsing this forum: Google [Bot] and 1 guest