magma_cgesv w/ multiple GPU box

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)

magma_cgesv w/ multiple GPU box

Postby mh1 » Thu Mar 14, 2013 4:42 pm

I have successfully download and integrated Magma 1.3. I am currently using magma_cgesv in one part of my application.

I have a machine with multiple GPU's. Will magma_cgesv automatically utilizes both of those GPU's? If not, how do I tell it which one to use (cudaSetDevice)?

Thank You
mh1
 
Posts: 22
Joined: Thu Mar 14, 2013 4:24 pm

Re: magma_cgesv w/ multiple GPU box

Postby mgates3 » Fri Mar 15, 2013 3:48 pm

You need to set the environment variable MAGMA_NUM_GPUS to the number that you want to use. E.g.:

setenv MAGMA_NUM_GPUS 2
./testing_cgesv

-mark
mgates3
 
Posts: 528
Joined: Fri Jan 06, 2012 2:13 pm

Re: magma_cgesv w/ multiple GPU box

Postby mh1 » Tue May 26, 2015 5:08 pm

Well, I could have multiple (and different) GPUs on a single box. For example I have a machine in my development environment with the following cards : K20, K20x, K40. How do I make MAGMA use only the K40? Or how would I make it use the K20 and K40? I need the ability to turn on/off a card for MAGMA usage.
mh1
 
Posts: 22
Joined: Thu Mar 14, 2013 4:24 pm

Re: magma_cgesv w/ multiple GPU box

Postby mgates3 » Wed May 27, 2015 1:23 pm

That's a bit cumbersome at the moment. The best I can offer is using CUDA_VISIBLE_DEVICES.

http://devblogs.nvidia.com/parallelfora ... e_devices/
http://acceleware.com/blog/cudavisibled ... sking-gpus

Hopefully this will improve in the future as we introduce better queue (stream) support into MAGMA.

-mark
mgates3
 
Posts: 528
Joined: Fri Jan 06, 2012 2:13 pm

Re: magma_cgesv w/ multiple GPU box

Postby mh1 » Fri May 29, 2015 8:58 pm

I have tried CUDA_VISIBLE_DEVICES using cgesv and I do not think it works. I have two machines as follows :

Ubuntu 14.04, 346.46 driver, 3x680
Ubuntu 14.04, 346.46 driver, 4xK10

On each machine I experimented solving a roughly 30,000x30,000 linear system with cgesv. I tried anywhere from 1 to all devices on each box and the wall clock times are essentially the same no matter what I set CUDA_VISIBLE_DEVICES to be. In fact, I monitor nvidia-smi during the "solve" part of the problem and it appears to me only the first device is used in all cases. I read the environment variable CUDA_VISIBILE_DEVICES in my program as well and echo to verify the variable is active at run-time.

Am I missing something?
mh1
 
Posts: 22
Joined: Thu Mar 14, 2013 4:24 pm

Re: magma_cgesv w/ multiple GPU box

Postby mgates3 » Sun May 31, 2015 1:50 pm

CUDA_VISIBLE_DEVICES limits which devices are available. It is not a count of the number of devices to use.

MAGMA_NUM_GPUS is a count of the number of devices to use for multi-GPU routines that don't take ngpu as an argument.

It's a bit unclear exactly what you did. Providing the exact settings would be helpful. If you can reproduce the problem using one of our provided testers, that would also be helpful. Please see the web pages that I linked to that describe CUDA_VISIBLE_DEVICES, particularly the acceleware one gives some examples.

Here's an example using 2 GPUs, which are devices 0 and 2. To MAGMA, these appear as devices 0 and 1, but you can verify with nvidia-smi where it is running.

Code: Select all
magma-trunk/testing> setenv MAGMA_NUM_GPUS 2
magma-trunk/testing> setenv CUDA_VISIBLE_DEVICES 0,2
magma-trunk/testing> ./testing_cgesv -N 1000 -N 30000
MAGMA 1.6.2 svn compiled for CUDA capability >= 3.5
CUDA runtime 7000, driver 7000. OpenMP threads 16. MKL 11.2.3, MKL threads 16.
device 0: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
device 1: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
Usage: ./testing_cgesv [options] [-h|--help]

ngpu 2
    N  NRHS   CPU Gflop/s (sec)   GPU GFlop/s (sec)   ||B - AX|| / N*||A||*||X||
================================================================================
 1000     1     ---   (  ---  )      7.80 (   0.34)   2.96e-10   ok
30000     1     ---   (  ---  )   3393.90 (  21.22)   2.25e-10   ok


[meanwhile, on another terminal]
Code: Select all
magma-trunk/testing> nvidia-smi
...
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     51461    C   ./testing_cgesv                                 96MiB |
|    2     51461    C   ./testing_cgesv                                 96MiB |
+-----------------------------------------------------------------------------+


Because of the increased communication overhead when using multiple GPUs, the benefits aren't seen until the matrix size gets rather large, say, N=30000. Depends a lot on your CPU, GPUs, and PCI bus. Here's with 1 to 3 K40c GPUs and a 2x8 core Intel Sandy Bridge Xeon.

Code: Select all
magma-trunk/testing> setenv MAGMA_NUM_GPUS 1
bunsen magma-trunk/testing> ./testing_cgesv -N 1000 -N 30000
MAGMA 1.6.2 svn compiled for CUDA capability >= 3.5
CUDA runtime 7000, driver 7000. OpenMP threads 16. MKL 11.2.3, MKL threads 16.
device 0: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
device 1: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
device 2: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
Usage: ./testing_cgesv [options] [-h|--help]

ngpu 1
    N  NRHS   CPU Gflop/s (sec)   GPU GFlop/s (sec)   ||B - AX|| / N*||A||*||X||
================================================================================
 1000     1     ---   (  ---  )    112.15 (   0.02)   3.44e-10   ok
30000     1     ---   (  ---  )   2250.70 (  31.99)   2.76e-10   ok

magma-trunk/testing> setenv MAGMA_NUM_GPUS 2
magma-trunk/testing> ./testing_cgesv -N 1000 -N 30000
MAGMA 1.6.2 svn compiled for CUDA capability >= 3.5
CUDA runtime 7000, driver 7000. OpenMP threads 16. MKL 11.2.3, MKL threads 16.
device 0: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
device 1: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
device 2: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
Usage: ./testing_cgesv [options] [-h|--help]

ngpu 2
    N  NRHS   CPU Gflop/s (sec)   GPU GFlop/s (sec)   ||B - AX|| / N*||A||*||X||
================================================================================
 1000     1     ---   (  ---  )      7.82 (   0.34)   2.96e-10   ok
30000     1     ---   (  ---  )   3457.67 (  20.83)   2.25e-10   ok

magma-trunk/testing> setenv MAGMA_NUM_GPUS 3
magma-trunk/testing> ./testing_cgesv -N 1000 -N 30000
MAGMA 1.6.2 svn compiled for CUDA capability >= 3.5
CUDA runtime 7000, driver 7000. OpenMP threads 16. MKL 11.2.3, MKL threads 16.
device 0: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
device 1: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
device 2: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
Usage: ./testing_cgesv [options] [-h|--help]

ngpu 3
    N  NRHS   CPU Gflop/s (sec)   GPU GFlop/s (sec)   ||B - AX|| / N*||A||*||X||
================================================================================
 1000     1     ---   (  ---  )      3.67 (   0.73)   2.96e-10   ok
30000     1     ---   (  ---  )   3516.41 (  20.48)   2.19e-10   ok
mgates3
 
Posts: 528
Joined: Fri Jan 06, 2012 2:13 pm

Re: magma_cgesv w/ multiple GPU box

Postby mh1 » Tue Jun 02, 2015 11:48 am

You are correct. I did not have MAGMA_NUM_GPUS properly set on the runs I attempted. I re-tested those runs and now all looks well. I am seeing nearly a 10x speed up using MAGMA CGESV yet maintaining accuracy (24 CPU MKL threads vs 3x680 GPUs). Pretty impressive.
mh1
 
Posts: 22
Joined: Thu Mar 14, 2013 4:24 pm


Return to User discussion

Who is online

Users browsing this forum: Baidu [Spider] and 1 guest