### About technical details of mixed precision routine DSGESV

Posted:

**Fri Dec 16, 2016 3:00 pm**Hello:

I'm thinking to use the DSGESV routine to solve some linear systems Ax=b and I have some questions about its accuracy and implementation in Lapack. First of all, the big question: is attainable using DSGESV the same accuracy in x than using the double DGESV version? As I understand it after reading [1], the answer is yes. But in [2] (page 234), [3, pages 108-109] and [4, page 140] it can be read that the maximum attained relative accuracy for x is (in infinite norm and using mixed precision with units roundoff u_1 and u_2, with u_2=u_1^2)

I think my doubt is due a misunderstanding by me, as I think that in the mixed iterative refinement algorithms presented in these books the original A and b data are used in u_1 (although the residual computation is stored in u_2), while in DSGESV are used in u_2 for r=b-Ax. Am I right?

Also, other question is about the stopping criteria used in DSGESV. I can see in [5] and [6] that the stopping criteria is

but in the current Lapack the criteria is

So my question is: what is the difference between the criteria based on the 2 and Frobenius norm and the one based on the infinity norm?

I've read [5], but can't find the derivation from where such stopping criteria comes. Is there any document where I could find it?

In [3, page 99] it can be seen that for the solution of Ax=b via LU with partial pivoting with not large grow factor can be stated that

which is similar to the DSGESV topping criteria, but using N instead of sqrt(N). Exist any objective reason for this or am I mixing concepts?

Also I can see in [2, page 104, exercise 7.2] that the forward error is bounded by (for any subordinated norm)

so an upper bound for ||x-hat{x}||/||x|| is k(A)*||r||/(||A||*||x||), and using the relation ||r||_Inf/(||x||_Inf*||A||_Inf) <= N*u I could write

or

if I use the stopping criteria from Lapack. Is this upper bound for the forward error of DSGESV in u_2 rigth?

*References

[1] A. Buttari, J. Dongarra, J. Langou, P. Luszczek and J. Kurzak, (2007), Mixed precision iterative refinement techniques for the solution of dense linear systems

[2] N.J. Higham, (2002), Accuracy and stability of numerical algorithms. 2nd ed., SIAM

[3] A. Björck (2015), Numerical methods in matrix computations. Springer

[4] G.H. Golub and C.F. Van Loan (2013), Matrix computations. 4th ed., John Hopkins

[5] J. Langou, J. Langou, P. Luszczek, J. Kurzak, A. Buttari and J. Dongarra, (2006), LAWN 175: Exploiting the performance of 32 bit floating point arithmetic in Obtaining 64 bit accuracy (revising iterative refinement for linear systems)

[6] http://icl.cs.utk.edu/iter-ref/custom/i ... 6&slid=207

Thank you

I'm thinking to use the DSGESV routine to solve some linear systems Ax=b and I have some questions about its accuracy and implementation in Lapack. First of all, the big question: is attainable using DSGESV the same accuracy in x than using the double DGESV version? As I understand it after reading [1], the answer is yes. But in [2] (page 234), [3, pages 108-109] and [4, page 140] it can be read that the maximum attained relative accuracy for x is (in infinite norm and using mixed precision with units roundoff u_1 and u_2, with u_2=u_1^2)

- Code: Select all
`||x-hat{x}||/||x|| ~ u_1`

I think my doubt is due a misunderstanding by me, as I think that in the mixed iterative refinement algorithms presented in these books the original A and b data are used in u_1 (although the residual computation is stored in u_2), while in DSGESV are used in u_2 for r=b-Ax. Am I right?

Also, other question is about the stopping criteria used in DSGESV. I can see in [5] and [6] that the stopping criteria is

- Code: Select all
`||r||_2 <= ||x||_2*||A||_F*u_2*sqrt(N)`

but in the current Lapack the criteria is

- Code: Select all
`||r||_Inf <= ||x||_Inf*||A||_Inf*u_2*sqrt(N)`

So my question is: what is the difference between the criteria based on the 2 and Frobenius norm and the one based on the infinity norm?

I've read [5], but can't find the derivation from where such stopping criteria comes. Is there any document where I could find it?

In [3, page 99] it can be seen that for the solution of Ax=b via LU with partial pivoting with not large grow factor can be stated that

- Code: Select all
`||r||_Inf/(||x||_Inf*||A||_Inf) <= N*u,`

which is similar to the DSGESV topping criteria, but using N instead of sqrt(N). Exist any objective reason for this or am I mixing concepts?

Also I can see in [2, page 104, exercise 7.2] that the forward error is bounded by (for any subordinated norm)

- Code: Select all
`||r||/(||A||*||x||) <= ||x-hat{x}||/||x|| <= k(A)*||r||/(||A||*||x||),`

so an upper bound for ||x-hat{x}||/||x|| is k(A)*||r||/(||A||*||x||), and using the relation ||r||_Inf/(||x||_Inf*||A||_Inf) <= N*u I could write

- Code: Select all
`||x-hat{x}||/||x|| <= N*u_2*k_Inf(A)`

or

- Code: Select all
`||x-hat{x}||/||x|| <= sqrt(N)*u_2*k_Inf(A)`

if I use the stopping criteria from Lapack. Is this upper bound for the forward error of DSGESV in u_2 rigth?

*References

[1] A. Buttari, J. Dongarra, J. Langou, P. Luszczek and J. Kurzak, (2007), Mixed precision iterative refinement techniques for the solution of dense linear systems

[2] N.J. Higham, (2002), Accuracy and stability of numerical algorithms. 2nd ed., SIAM

[3] A. Björck (2015), Numerical methods in matrix computations. Springer

[4] G.H. Golub and C.F. Van Loan (2013), Matrix computations. 4th ed., John Hopkins

[5] J. Langou, J. Langou, P. Luszczek, J. Kurzak, A. Buttari and J. Dongarra, (2006), LAWN 175: Exploiting the performance of 32 bit floating point arithmetic in Obtaining 64 bit accuracy (revising iterative refinement for linear systems)

[6] http://icl.cs.utk.edu/iter-ref/custom/i ... 6&slid=207

Thank you