Run time on the CPU is greater than on the GPU

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)
Post Reply
Ham
Posts: 4
Joined: Thu Aug 24, 2017 5:40 am

Run time on the CPU is greater than on the GPU

Post by Ham » Wed Sep 06, 2017 8:28 am

Hey all,

When I measured the runtime on the CPU and GPU with CPU_TIME(), I found that the run time on the CPU is greater than on the GPU. I use the fortran interface with GMRES solver without the precond because with the precond it takes more time.
what could cause this?
I run my program on HP Intel machine Xeon E3-1240v3 3.4 GHz with gfortran-4.9
below the run time for the first 6 iterations / 100iter

Code: Select all

iter   CPU run time    /s         |    GPU run time   /s
  1    4.0000000000000036E-003    |    0.35599999999999987                           
  2    0.0000000000000000         |    1.6000000000000014E-002                      
  3    4.0000000000000036E-003    |    1.9999999999999574E-002                      
  4    0.0000000000000000         |    1.6000000000000014E-002                      
  5    4.0000000000000036E-003    |    2.0000000000003126E-002                      
  6    0.0000000000000000         |    1.6000000000001791E-002   
Thank you in advance.
Last edited by mgates3 on Wed Sep 06, 2017 9:15 am, edited 1 time in total.
Reason: add [code] for readability

mgates3
Posts: 842
Joined: Fri Jan 06, 2012 2:13 pm

Re: Run time on the CPU is greater than on the GPU

Post by mgates3 » Wed Sep 06, 2017 9:58 am

Can you be more specific about what you are doing? For instance:
  • How big is the problem (rows/cols, number of nonzeros)?
  • What format is your matrix (CSR, ...)?
  • Are you including time to transfer the matrix to the GPU, or are you transferring the matrix before?
  • What MAGMA functions are you calling?
  • What model GPU are you using?
I suggest using magmaf_wtime, omp_get_wtime, or MPI_Wtime, instead of cpu_time(). Fortran's cpu_time() seems to measure CPU time used by the process (i.e., time the process is working, similar to getrusage), not elapsed wall clock time. Notably, cpu_time() will not reflect time spent on the GPU. We always use wall time.

Here's a simple test (see code below). When timing sleep(1), cpu_time() measures almost no time, since the process isn't working, but magmaf_wtime() measures the expected 1 second elapsed wall time. When timing gemm() with 1 thread, cpu_time() and magmaf_wtime() measure similar times.

Code: Select all

prompt> setenv OMP_NUM_THREADS 1
prompt> setenv VECLIB_MAXIMUM_THREADS 1
prompt> ./time
sleep(1)
cpu_time     = 0.000139
magmaf_wtime = 1.003619

gemm()
cpu_time     = 0.067450
magmaf_wtime = 0.067607
But when timing gemm() with 2 threads, cpu_time() measures time working in both threads, so it is double the wall clock elapsed time that magmaf_wtime() measures.

Code: Select all

prompt> setenv OMP_NUM_THREADS 2
prompt> setenv VECLIB_MAXIMUM_THREADS 2
prompt> ./time
sleep(1)
cpu_time     = 0.000143
magmaf_wtime = 1.005181

gemm()
cpu_time     = 0.081534
magmaf_wtime = 0.041313

Code: Select all

program main
    use magma
    implicit none
    
    double precision :: start, start2, t, t2
    double precision :: A(1000,1000), B(1000,1000), C(1000,1000)
    integer :: n
    double precision :: alpha, beta
    n = 1000
    alpha = 1.0
    beta  = 2.0
    
    call cpu_time( start )
    call magmaf_wtime( start2 )
    print '(a)', 'sleep(1)'
    call sleep(1)
    call cpu_time( t )
    call magmaf_wtime( t2 )
    t = t - start
    t2 = t2 - start2
    print '(a,f8.6)', 'cpu_time     = ', t
    print '(a,f8.6)', 'magmaf_wtime = ', t2
    print '()'
    
    call cpu_time( start )
    call magmaf_wtime( start2 )
    print '(a)', 'gemm()'
    call dgemm( "n", "n", n, n, n, alpha, A, n, B, n, beta, C, n )
    call cpu_time( t )
    call magmaf_wtime( t2 )
    t = t - start
    t2 = t2 - start2
    print '(a,f8.6)', 'cpu_time     = ', t
    print '(a,f8.6)', 'magmaf_wtime = ', t2
    print '()'
end
On MacOS, compiled with:

Code: Select all

gfortran -Wall -I /opt/magma/include -o time time.f90 -L /opt/magma/lib -Wl,-rpath,/opt/magma/lib -lmagma -framework Accelerate
-mark

Post Reply