concurrent kernel execution in Magma

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)

concurrent kernel execution in Magma

Postby shingoxlf » Wed Jul 23, 2014 10:40 am

Hi, I am new to magma. I am wondering if I can run multiple magma gpu functions concurrently?

I know some of the magma functions are hybrid and some are working on just GPU. How can I check that?

I am using these functions:

magma_dgemm(MagmaNoTrans, MagmaNoTrans,n,n,n,scale1,d_a,n,d_b,n,scale2,d_c,n);
magma_dgemv(MagmaNoTrans,n,n,scale1,d_a,n,d_workvec1,inc,scale2,d_workvec2,inc);
magma_dpotrs_gpu(MagmaUpper,k,incx,d_a,k,d_workvec1,k,&info);
magma_dpotrf_gpu(MagmaUpper,blksize,d_a,blksize,&info);
magma_dnrm2(temp,d_a,incx);

Also, if they can be executed concurrently, how can I change my code to do so? Where can I add the stream? Thanks!
shingoxlf
 
Posts: 3
Joined: Sat Jun 14, 2014 2:08 pm

Re: concurrent kernel execution in Magma

Postby mgates3 » Wed Jul 23, 2014 6:12 pm

Most MAGMA BLAS kernels are asynchronous and run only on the GPU (not hybrid). This includes gemm, gemv, nrm2.
magma_dgemm, magma_dgemv, magma_dnrm2 are simply wrappers around the respective cublas functions. They are async. In the new 1.5 beta 3, the doxygen documentation (in docs/html/index.html) has a section for BLAS and auxiliary functions.

Most other MAGMA functions are synchronous and hybrid (use both CPU and GPU). This includes posv, potrf, gesv, getrf, etc.

What do you mean by concurrently? If you mean executing two async MAGMA BLAS GPU kernels from a single CPU thread, then you just need to use two different CUDA streams.

Code: Select all
    magmaSetKernelStream( stream1 )
    magma_dgemm( ... );
    magmaSetKernelStream( stream2 )
    magma_dgemm( ... );


If you mean executing two async. MAGMA BLAS GPU kernels from different CPU threads, then you have to be careful to lock things. Something like this should work (from magmablasSetKernelStream docs):

Code: Select all
    thread 1                            thread 2
    ------------------------------      ------------------------------
1.  lock()                                 
2.  magmablasSetKernelStream( s1 )         
3.  magma_dgemm( ... )                     
4.  unlock()                               
5.                                      lock()
6.                                      magmablasSetKernelStream( s2 )
7.                                      magma_dgemm( ... )
8.                                      unlock()


where lock() and unlock() are some mutex lock functions that you provide, e.g. pthread_mutex_lock.

If you mean executing two hybrid MAGMA functions like getrf in different CPU threads, that won't currently work.
-mark
Last edited by mgates3 on Wed Jul 23, 2014 6:56 pm, edited 1 time in total.
Reason: finish sentence about doxygen
mgates3
 
Posts: 738
Joined: Fri Jan 06, 2012 2:13 pm

Re: concurrent kernel execution in Magma

Postby mgates3 » Fri May 20, 2016 10:42 am

Note that this changed in MAGMA 2.0, using magma_v2.h. Since each gemm call now takes a queue, you no longer call magmablasSetKernelStream, and there is no need for locks.
-mark
mgates3
 
Posts: 738
Joined: Fri Jan 06, 2012 2:13 pm


Return to User discussion

Who is online

Users browsing this forum: No registered users and 4 guests