Hi,
I have been trying to use the ILU preconditioner in sparseiter. I'm using it for iterative linear solvers. I tried the testing_zsolver example for LAPLACE2D of size 200 with PBiCGSTAB (Jacobi and ILU preconditioners). The linear solve finishes in 0.2 seconds with Jacobi precond, and takes 300 seconds with ILU precond. Also, what's the difference between norm2 residual and exact residual?
# matrix info: 40000by40000 with 199598 nonzeros
%=============================================================%
% BiCGSTAB performance analysis every 100 iteration
%=============================================================%
% iter  residualnrm2  runtime
%=============================================================%
0  2.000000e+02  0.000000
100  3.552788e+01  0.057975
200  4.275729e02  0.116186
300  4.799239e08  0.174211
%=============================================================%
%=============================================================%
% PBiCGSTAB solver summary:
% initial residual: 2.000000e+02
% iterations: 371
% exact final residual: 2.415949e09
% runtime: 0.2192 sec
%=============================================================%

%=============================================================%
% BiCGSTAB performance analysis every 100 iteration
% Preconditioner used: iterative ILU(0).
%=============================================================%
% iter  residualnrm2  runtime
%=============================================================%
0  2.000000e+02  0.000000
100  1.483437e04  127.146813
200  2.915042e11  254.295898
%=============================================================%
%=============================================================%
% PBiCGSTAB solver summary:
% initial residual: 2.000000e+02
% iterations: 233
% exact final residual: 1.990934e09
% runtime: 296.2562 sec
%=============================================================%
Thanks,
Harshad
What's wrong with the ILU sparse preconditioner?

 Posts: 90
 Joined: Tue Sep 02, 2014 5:44 pm
Re: What's wrong with the ILU sparse preconditioner?
Dear Harshad,
The test problem you are addressing (Laplace 2D of size 40,000) is a very structured, very sparse problem. In particular, it contains only about 5 nnz/row. This is why operations like SpMV or Jacobipreconditioning are very fast. They can be parallelized by rows.
For the ILU preconditioner, two triangular systems have to be solved via forward/backward substitution in each iteration. Although NVIDIA's level scheduling algorithm is used, the triangular solves are hard to parallelize and pose a bottleneck. This significantly increases the runtime of each iteration. The only advantage you have is, that you can cut down the number of total iterations needed for convergence. However, as every iteration takes significantly longer, this advantage is not reflected in the overall runtime.
The difference between the implicitly computed residual and the exact residual is coming from the rounding effects. Theoretically, the values should be identical, but the implicit residual diverges from the exact residual over the runtime of the solver.
Let me know whether this answers your questions.
Hartwig
The test problem you are addressing (Laplace 2D of size 40,000) is a very structured, very sparse problem. In particular, it contains only about 5 nnz/row. This is why operations like SpMV or Jacobipreconditioning are very fast. They can be parallelized by rows.
For the ILU preconditioner, two triangular systems have to be solved via forward/backward substitution in each iteration. Although NVIDIA's level scheduling algorithm is used, the triangular solves are hard to parallelize and pose a bottleneck. This significantly increases the runtime of each iteration. The only advantage you have is, that you can cut down the number of total iterations needed for convergence. However, as every iteration takes significantly longer, this advantage is not reflected in the overall runtime.
The difference between the implicitly computed residual and the exact residual is coming from the rounding effects. Theoretically, the values should be identical, but the implicit residual diverges from the exact residual over the runtime of the solver.
Let me know whether this answers your questions.
Hartwig