Run time on the CPU is greater than on the GPU

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)

Run time on the CPU is greater than on the GPU

Postby Ham » Wed Sep 06, 2017 8:28 am

Hey all,

When I measured the runtime on the CPU and GPU with CPU_TIME(), I found that the run time on the CPU is greater than on the GPU. I use the fortran interface with GMRES solver without the precond because with the precond it takes more time.
what could cause this?
I run my program on HP Intel machine Xeon E3-1240v3 3.4 GHz with gfortran-4.9
below the run time for the first 6 iterations / 100iter

Code: Select all
iter   CPU run time    /s         |    GPU run time   /s
  1    4.0000000000000036E-003    |    0.35599999999999987                           
  2    0.0000000000000000         |    1.6000000000000014E-002                     
  3    4.0000000000000036E-003    |    1.9999999999999574E-002                     
  4    0.0000000000000000         |    1.6000000000000014E-002                     
  5    4.0000000000000036E-003    |    2.0000000000003126E-002                     
  6    0.0000000000000000         |    1.6000000000001791E-002   


Thank you in advance.
Last edited by mgates3 on Wed Sep 06, 2017 9:15 am, edited 1 time in total.
Reason: add [code] for readability
Ham
 
Posts: 4
Joined: Thu Aug 24, 2017 5:40 am

Re: Run time on the CPU is greater than on the GPU

Postby mgates3 » Wed Sep 06, 2017 9:58 am

Can you be more specific about what you are doing? For instance:
  • How big is the problem (rows/cols, number of nonzeros)?
  • What format is your matrix (CSR, ...)?
  • Are you including time to transfer the matrix to the GPU, or are you transferring the matrix before?
  • What MAGMA functions are you calling?
  • What model GPU are you using?

I suggest using magmaf_wtime, omp_get_wtime, or MPI_Wtime, instead of cpu_time(). Fortran's cpu_time() seems to measure CPU time used by the process (i.e., time the process is working, similar to getrusage), not elapsed wall clock time. Notably, cpu_time() will not reflect time spent on the GPU. We always use wall time.

Here's a simple test (see code below). When timing sleep(1), cpu_time() measures almost no time, since the process isn't working, but magmaf_wtime() measures the expected 1 second elapsed wall time. When timing gemm() with 1 thread, cpu_time() and magmaf_wtime() measure similar times.
Code: Select all
prompt> setenv OMP_NUM_THREADS 1
prompt> setenv VECLIB_MAXIMUM_THREADS 1
prompt> ./time
sleep(1)
cpu_time     = 0.000139
magmaf_wtime = 1.003619

gemm()
cpu_time     = 0.067450
magmaf_wtime = 0.067607


But when timing gemm() with 2 threads, cpu_time() measures time working in both threads, so it is double the wall clock elapsed time that magmaf_wtime() measures.
Code: Select all
prompt> setenv OMP_NUM_THREADS 2
prompt> setenv VECLIB_MAXIMUM_THREADS 2
prompt> ./time
sleep(1)
cpu_time     = 0.000143
magmaf_wtime = 1.005181

gemm()
cpu_time     = 0.081534
magmaf_wtime = 0.041313


Code: Select all
program main
    use magma
    implicit none
   
    double precision :: start, start2, t, t2
    double precision :: A(1000,1000), B(1000,1000), C(1000,1000)
    integer :: n
    double precision :: alpha, beta
    n = 1000
    alpha = 1.0
    beta  = 2.0
   
    call cpu_time( start )
    call magmaf_wtime( start2 )
    print '(a)', 'sleep(1)'
    call sleep(1)
    call cpu_time( t )
    call magmaf_wtime( t2 )
    t = t - start
    t2 = t2 - start2
    print '(a,f8.6)', 'cpu_time     = ', t
    print '(a,f8.6)', 'magmaf_wtime = ', t2
    print '()'
   
    call cpu_time( start )
    call magmaf_wtime( start2 )
    print '(a)', 'gemm()'
    call dgemm( "n", "n", n, n, n, alpha, A, n, B, n, beta, C, n )
    call cpu_time( t )
    call magmaf_wtime( t2 )
    t = t - start
    t2 = t2 - start2
    print '(a,f8.6)', 'cpu_time     = ', t
    print '(a,f8.6)', 'magmaf_wtime = ', t2
    print '()'
end


On MacOS, compiled with:
Code: Select all
gfortran -Wall -I /opt/magma/include -o time time.f90 -L /opt/magma/lib -Wl,-rpath,/opt/magma/lib -lmagma -framework Accelerate


-mark
mgates3
 
Posts: 734
Joined: Fri Jan 06, 2012 2:13 pm


Return to User discussion

Who is online

Users browsing this forum: Bing [Bot] and 2 guests

cron