Hi Stan,

Do you have the 1000 RHSs at once or you get them (and solve) one by one in some iterative process.

It's a iterative process, in other words, for Imax = 1000 & N=10112 => the global time is 1000*184.706(ms) = 184s

But in reality, I have to increase accuracy of my calculs, I have to maximize the size of A. And actualy, I don't know the max size of A and so of the system (respecting nhrs=1) I can allocate on a GTX 295 with almost 1792MB GDDR3???

If you have your RHSs on the CPU you can do the slaswp at once on the CPU and send the data only once to the GPU for the triangular solves.

As I said, I have to compute juste one time the LU factorization and iterate N times the linear system solving. So, I can do juste one time slaswp after the LU factorization.

I benchmark slaswp CPU routine with data copie.

- Code: Select all
` start = get_current_time();`

cublasGetMatrix( N, NRHS, sizeof(float), B,N ,h_work_M_S, N);

int k1 = 1 ;

int k2 = N;

int k3 = 1;

slaswp_(&NRHS, h_work_M_S, &LDB, &k1, &k2, IPIV, &k3);

cublasSetMatrix( N, NRHS, sizeof(float), h_work_M_S, N, B, N);

end = get_current_time();

N GPU GFlop/s time(ms)

========================================================

1024 16695.93 0.043000

2048 95583.53 0.060000

3072 268697.59 0.072000

4032 546642.44 0.080000

5184 978208.38 0.095000

6016 1344698.75 0.108000

7040 1972103.62 0.118000

8064 2649402.25 0.132000

9088 3405177.00 0.147000

10112 1318399.62 0.523000

So, this has no influence on global speed of routine: N=1024, 0,043ms among 4.059ms.

And my last question, Can you explain me what is the hwork array ?

I konw only that:

HWORK (workspace) REAL array, dimension N*NRHS[/code] from MAGMA guide

PS:

Performance of SGETRS_GPU with NRHS = 1:

- Code: Select all
` N GPU GFlop/s || b-Ax || / ||A|| Time (ms)`

========================================================

1024 176.87 2.513783e-07 4.059000

2048 517.23 2.111665e-06 11.088000

3072 911.57 5.364181e-06 21.223000

4032 1282.67 6.626522e-07 34.094000

5184 1748.31 6.161365e-07 53.154000

6016 2092.16 2.017927e-06 69.415000

7040 2494.92 1.475847e-06 93.273000

8064 2919.91 3.561859e-06 119.771000

9088 3324.24 1.063919e-06 150.579000

10112 3733.08 8.097373e-07 184.706000