## multiple gpu

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)
boreas
Posts: 10
Joined: Fri Apr 13, 2018 8:17 pm

### multiple gpu

Hello,

I have a general question regarding the limitation of the size of the problem by the memory available in GPU. this is about the generalized eigenvalue problem, saying magma_dsygvdx, both eigenvalue and eigenvectors are needed. If i have one GPU with 16GB memory, to what size of the matrix in AX=(lambda)BX, magma can solve? Can mulitple GPU help in solving large size of matrix?

thanks,
boreas

boreas
Posts: 10
Joined: Fri Apr 13, 2018 8:17 pm

### Re: multiple gpu

btw, is the mulitple gpu version of routine _m adaptive to the device count? I put ngpu=2, and run on a machine with one GPU only but it runs successfully without error or warning. thanks,

boreas

mgates3
Posts: 916
Joined: Fri Jan 06, 2012 2:13 pm

### Re: multiple gpu

Based on tracing the memory usage, magma_dsygvdx() requires about 2.5 n^2 memory. At various times it needs to hold the matrices A, B, eigenvectors, plus workspace.

Yes, the multi-GPU version magma_dsygvdx_m() can handle larger matrices. Oddly, even with a single GPU, it appears to require less, about 1.5 n^2 memory.

Below is a table of actual time and memory usage, on K20m and 2x10-core Haswell (E5-2650 v3).

Code: Select all

``````    n   |   dsygvdx                                   |   dsygvdx_m( ngpu = 1 )
|      time   GPU memory   GPU memory         |      time   GPU memory   GPU memory
--------+---------------------------------------------+------------------------------------------
1000   |    0.1181     35.1 MiB   4.60 n^2 doubles   |    0.1280     23.7 MiB   3.11 n^2 doubles
2000   |    0.4438    115.9 MiB   3.80 n^2 doubles   |    0.4319     55.1 MiB   1.80 n^2 doubles
3000   |    1.4149    249.8 MiB   3.64 n^2 doubles   |    1.5135    110.6 MiB   1.61 n^2 doubles
4000   |    2.6409    436.8 MiB   3.58 n^2 doubles   |    2.8571    196.7 MiB   1.61 n^2 doubles
5000   |    5.1591    679.5 MiB   3.56 n^2 doubles   |    5.3543    293.7 MiB   1.54 n^2 doubles
6000   |    9.1811    698.0 MiB   2.54 n^2 doubles   |    8.4615    419.6 MiB   1.53 n^2 doubles
7000   |   13.5773    946.1 MiB   2.53 n^2 doubles   |   12.6406    568.4 MiB   1.52 n^2 doubles
8000   |   18.9661   1232.3 MiB   2.52 n^2 doubles   |   17.2574    740.1 MiB   1.52 n^2 doubles
9000   |   24.2933   1558.7 MiB   2.52 n^2 doubles   |   23.6092    934.7 MiB   1.51 n^2 doubles
10000   |   32.4998   1921.2 MiB   2.52 n^2 doubles   |   30.8866   1152.1 MiB   1.51 n^2 doubles
11000   |   41.4299   2321.7 MiB   2.51 n^2 doubles   |   38.9421   1392.5 MiB   1.51 n^2 doubles
12000   |   51.6525   2760.2 MiB   2.51 n^2 doubles   |   49.7362   1655.7 MiB   1.51 n^2 doubles
13000   |   62.0538   3240.0 MiB   2.51 n^2 doubles   |   60.6915   1941.8 MiB   1.51 n^2 doubles
14000   |   74.6738   3754.8 MiB   2.51 n^2 doubles   |   73.3559   2250.9 MiB   1.51 n^2 doubles
15000   |   89.2196   4307.6 MiB   2.51 n^2 doubles   |   90.1193   2582.8 MiB   1.50 n^2 doubles
``````
-mark

mgates3
Posts: 916
Joined: Fri Jan 06, 2012 2:13 pm

### Re: multiple gpu

Also, I'm surprised that it would work with ngpu = 2 if you have only 1 GPU. You're doing that in your own code, or using the MAGMA tester? The MAGMA tester shouldn't allow it.
-mark

boreas
Posts: 10
Joined: Fri Apr 13, 2018 8:17 pm

### Re: multiple gpu

hello Mark,

would you elaborate how the memory is required in CPU side and GPU side? the comments say

lwork INTEGER
The length of the array WORK.
- If N <= 1, LWORK >= 1.
- If JOBZ = MagmaNoVec and N > 1, LWORK >= 2*N + N*NB.
- If JOBZ = MagmaVec and N > 1, LWORK >= max( 2*N + N*NB, 1 + 6*N + 2*N**2 ).
NB can be obtained through magma_get_dsytrd_nb(N).

is this for CPU memory, right? But it does not say how much is needed for GPU memory.

also, I guess the size of problem solvable is also bounded by the Magma_int_t. If it is 32 bit, the maximum solvable problem is less than 30,000, to my experience. what do you think?

thanks,

mgates3
Posts: 916
Joined: Fri Jan 06, 2012 2:13 pm

### Re: multiple gpu

Yes, the workspaces are on the CPU. It internally allocates GPU workspace.

For all MAGMA routines, magma_int_t needs to address the entire matrix, so if magma_int_t is signed 32-bit, the maximum size is bound by 2**31 entries, or
n = int( sqrt( 2**31 ) / 32 ) * 32 = 46336
for a square matrix. I rounded down to a multiple of 32 to account for ldda, which is typically a multiple of 32 on the GPU.

For sygvdx in particular, it allocates a workspace of size about 3/2*n*n, so that would limit it to n = 37824. However, when I ran it, I was only able to successfully run up to n = 32000. I'm not sure why the discrepancy. There might be another workspace somewhere of size 2*n*n, which would limit it to n = 32768.

Code: Select all

``````./testing_ssygvdx -n 1234,25000:50000:1000 -JV -c

% MAGMA 2.3.0 svn compiled for CUDA capability >= 3.5, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 9020, driver 9020. OpenMP threads 20. MKL 2018.0.1, MKL threads 20.
% device 0: Tesla V100-PCIE-16GB, 1380.0 MHz clock, 16160.5 MiB memory, capability 7.0
% Wed Oct 17 10:30:26 2018
% Usage: ./testing_ssygvdx [options] [-h|--help]

% itype = 1, jobz = Vectors needed, uplo = Lower, ngpu = 1
%   N     M   GPU Time (sec)   |AZ-BZD|   |D - D_magma|
%======================================================
25000 25000     44.2406        2.36e-10        7.02e-10  ok
26000 26000     48.9626        2.39e-10        7.75e-10  ok
27000 27000     54.2609        2.34e-10        5.16e-10  ok
28000 28000     57.3955        2.46e-10        5.47e-10  ok
29000 29000     63.4750        2.34e-10        5.70e-10  ok
30000 30000     67.1582        2.30e-10        5.46e-10  ok
31000 31000     73.2874        2.39e-10        8.14e-10  ok
32000 32000     80.7398        2.26e-10        4.87e-10  ok
``````
The easiest solution is to switch to compiling with ILP64 when solving large systems.

-mark

boreas
Posts: 10
Joined: Fri Apr 13, 2018 8:17 pm

### Re: multiple gpu

thanks. assuming 2*n^^2 is needed by sygvdx. Is the same amount needed by both CPU side and GPU side?
is that possible to mix Magma ILP64 w/ MKL LP64? Thank you

mgates3
Posts: 916
Joined: Fri Jan 06, 2012 2:13 pm

### Re: multiple gpu

A cursory look at the sygvdx and related codes didn't show CPU allocations, so the only one would be the workspace passed to sygvdx, which is indeed O( 2n^2 ).

No, if MAGMA is ILP64, then it requires MKL to be ILP64. Consider info or the ipiv vector in getrf — MAGMA and MKL need to agree whether they are 32-bit or 64-bit. Also, similar issues may strike MKL if only LP64 is used, i.e., it could fail for large matrices due to overflow in indexing.

-mark

boreas
Posts: 10
Joined: Fri Apr 13, 2018 8:17 pm

### Re: multiple gpu

Thanks so much for the reply. The 2*N^^2 really limits the solution capacity of LP64 build. there is some obstacle to use ILP64 in our environments. MKL has other algorithms which needs much smaller work space. will Magma consider implement them, for example MRRR? Just curious.

mgates3
Posts: 916
Joined: Fri Jan 06, 2012 2:13 pm

### Re: multiple gpu

MAGMA has [cz]heevr (complex MRRR eigenvalues). It just has never been ported to real ([sd]syevr), nor put into the generalized problem ([cz]hegvr / [sd]sygvr). In fact, LAPACK doesn't have [cz]hegvr / [sd]sygvr, it appears. So there's a possibility, but we aren't currently actively working on these eigenvalue codes.

-mark