Using Magma 1.40 for Acceptance Tests: Any Suggestions?

Open discussion for MAGMA

Using Magma 1.40 for Acceptance Tests: Any Suggestions?

Postby MM_user » Mon Oct 21, 2013 12:44 pm

We're ordering servers that have two CPUs and two dual-GPU video cards in a 2u rack-mounted chassis. Our concern is that the server might not have enough cooling capacity when both the CPU and GPU are fully utilized.

We're thinking of using magma in the acceptance tests in order to heavily utilize both GPU and CPU to ensure that the servers operate properly and stay cool under load. We're looking for suggestions on what tests to run. I was thinking of running all of the tests in the magma testing directory, but there are more than 300 of them. What would be a good test for our purpose? Ideally, the test should utilize as many CUDA cores as possible from all four GPUs, as well as as many cores as possible from both CPUs. If the test completes quickly, we can simply rerun it multiple times for several hours.
MM_user
 
Posts: 3
Joined: Mon Oct 21, 2013 9:04 am

Re: Using Magma 1.40 for Acceptance Tests: Any Suggestions?

Postby mgates3 » Tue Oct 22, 2013 11:27 am

If you just want to create load, I would use either QR (dgeqrf) or Cholesky (dpotrf). E.g.,

testing/testing_dgeqrf

does a double-precision QR factorization for a variety of problem sizes. You can add a range or number of iterations if you need to run for longer. You can also check the results with -c, but the check runs on the CPU, not the GPU.

testing/testing_dgeqrf --range 10000:20000:1000 --niter 5 -c

The other 3 precisions are also useful -- sgeqrf for single, cgeqrf for single-complex, and zgeqrf for double-complex. The complex versions do about 4 times as much computation. As for loading the GPU, there's very little difference between the CPU interface (testing_dgeqrf) and the GPU interface (testing_dgeqrf_gpu).

-mark
mgates3
 
Posts: 421
Joined: Fri Jan 06, 2012 2:13 pm

Re: Using Magma 1.40 for Acceptance Tests: Any Suggestions?

Postby MM_user » Tue Oct 22, 2013 12:01 pm

Thanks you for your suggestion. But I think I have some questions. I tried running with --nthread 12 (on a six-core machine) and the results were identical to the single core run. In both cases, it seems to be using only a single CPU core. Is there any way I can run this on all CPU cores?

Also, I ran both the CPU interface and the GPU interface, and their run times were identical (as you predicted). Why is that? What is the difference between the two computationally? Are they doing the excact same thing?

Ideally, I'd like something that runs on both the GPU and CPU at the same time, and uses all cores for each.
MM_user
 
Posts: 3
Joined: Mon Oct 21, 2013 9:04 am

Re: Using Magma 1.40 for Acceptance Tests: Any Suggestions?

Postby mgates3 » Tue Oct 22, 2013 3:36 pm

The nthread option is not used for all routines. The best way to see which options are used by any particular tester is to look in the testing code (e.g., testing_dgeqrf.cpp). For some option xyz, look for opts.xyz.

In MAGMA, most CPU threading is left to the LAPACK and BLAS libraries. (Exceptions are some eigenvalue routines where MAGMA uses OpenMP.) For instance, with Intel's MKL, set the $MKL_NUM_THREADS environment variable. Some libraries use OpenMP, so setting $OMP_NUM_THREADS works. If you are using ATLAS, according to the FAQ below, the number of threads is fixed at compile time, and you need to link with the threaded ATLAS libraries (-lptcblas -lptf77blas) instead of the serial ATLAS libraries (-lcblas -lf77blas).
http://math-atlas.sourceforge.net/faq.html#tnum

In general, the CPU interface transfers the matrix to the GPU, does the computation, then transfers the result back. It attempts to hide the transfers by overlapping them with part of the computation. The GPU interface doesn't have to transfer the entire matrix, so it can sometimes be a bit faster. The computation itself is generally exactly the same.

MAGMA uses both the CPU and the GPU simultaneously (in general, depending on the algorithm). For instance, for QR, it does the panel factorization on the CPU while doing the previous trailing matrix update on the GPU. We are working on dynamic scheduling to better utilize all the CPU cores.

Since you have multiple GPUs, you may want to use the multi-GPU routines, denoted with _mgpu or _m.

./testing_dgeqrf_mgpu --ngpu 2

If you set $MAGMA_NUM_GPUS, some CPU interfaces will also use multiple GPUs.

setenv MAGMA_NUM_GPUS 2
./testing_dgesv

Interfaces that do this are: geev_m, gehrd_m, geqrf, gesv, getrf, posv, potrf.

-mark
mgates3
 
Posts: 421
Joined: Fri Jan 06, 2012 2:13 pm

Re: Using Magma 1.40 for Acceptance Tests: Any Suggestions?

Postby mgates3 » Tue Oct 22, 2013 5:36 pm

Also, why are you trying to use 12 threads on 6 cores? If for hyperthreading, that usually doesn't help numerical codes. Just use 6 threads. You can test it with the BLAS gemm if you like. For instance, on a 12-core machine (2 sockets x 6 cores), the CPU Gflop/s increases from 6 threads to 12 threads, but is flat moving to 24 threads. The performance can degrade when using hyperthreading because multiple threads are oversubscribing the functional units. (The exception is the Intel Xeon Phi (MIC), which requires hyperthreading to achieve its memory bandwidth.)

Code: Select all
> setenv MKL_NUM_THREADS 6
> ./testing_sgemm -N 5000 -l
    M     N     K   MAGMA Gflop/s (ms)  CUBLAS Gflop/s (ms)   CPU Gflop/s (ms)  MAGMA error  CUBLAS error
=========================================================================================================
 5000  5000  5000    794.80 ( 314.54)     716.88 ( 348.73)    132.09 (1892.62)    5.53e-06     5.53e-06


> setenv MKL_NUM_THREADS 12
> ./testing_sgemm -N 5000 -l
    M     N     K   MAGMA Gflop/s (ms)  CUBLAS Gflop/s (ms)   CPU Gflop/s (ms)  MAGMA error  CUBLAS error
=========================================================================================================
 5000  5000  5000    794.81 ( 314.54)     710.08 ( 352.07)    233.76 (1069.47)    5.54e-06     5.54e-06


> setenv MKL_NUM_THREADS 24
> ./testing_sgemm -N 5000 -l
    M     N     K   MAGMA Gflop/s (ms)  CUBLAS Gflop/s (ms)   CPU Gflop/s (ms)  MAGMA error  CUBLAS error
=========================================================================================================
 5000  5000  5000    794.76 ( 314.56)     711.28 ( 351.48)    231.36 (1080.56)    5.53e-06     5.53e-06
mgates3
 
Posts: 421
Joined: Fri Jan 06, 2012 2:13 pm


Return to User discussion

Who is online

Users browsing this forum: No registered users and 2 guests