by mgates3 » Fri Aug 03, 2012 8:27 am
Hi Evan,
Compiling MAGMA using magma_int_t as long and linking with MKL ilp64, I was able to factor matrices up to 100,000 using a single GPU. Though I did have problems at n=60,000, where CUDA was giving a kernel launch failure. I'm not sure what the problem is there. I had to modify the testing program to have a single copy of A, rather than saving a second copy, hence no residual check.
Now, I know you are trying to use magma_int_t as int, and just fix wherever it computes the offsets. Also check wherever it allocates memory. It seems that should work, but we haven't done any tests like that. Problems should exhibit themselves at about n=50,000, where n^2 overflows a signed 32-bit int. If you can send your modified code, I can test it out and see if there are any problems I see with it.
These results are on remus, a 48 core AMD Opteron 6180 at 2.5 GHz (4 sockets x 12 cores) with 256 GB memory. It has two GPUs installed, but I was using just one. Except for the modified testing program and magma_int_t, this is with the stock MAGMA 1.2.1 distribution.
setenv MKL_NUM_THREADS 12
numactl --interleave=all --physcpubind=0-11 ./testing_zgetrf -N 1000 [...other sizes...] -N 100000
device 0: Tesla T20 Processor, 1147.0 MHz clock, 2687.4 MB memory, capability 2.0
device 1: Tesla T20 Processor, 1147.0 MHz clock, 2687.4 MB memory, capability 2.0
M N CPU GFlop/s (sec) GPU GFlop/s (sec) ||PA-LU||/(||A||*N)
========================================================
1000 1000 --- ( --- ) 52.92 ( 0.05) ---
5000 5000 --- ( --- ) 197.25 ( 1.69) ---
10000 10000 --- ( --- ) 250.48 ( 10.65) ---
15000 15000 --- ( --- ) 258.73 ( 34.79) ---
20000 20000 --- ( --- ) 262.55 ( 81.25) ---
40000 40000 --- ( --- ) 264.29 ( 645.76) ---
70000 70000 --- ( --- ) 255.69 (3577.23) ---
80000 80000 --- ( --- ) 254.50 (5364.72) ---
80000 80000 --- ( --- ) 251.66 (5425.38) ---
90000 90000 --- ( --- ) 250.28 (7767.36) ---
100000 100000 --- ( --- ) 238.58 (11177.16) ---
-mark