LINPACK - why should LDA > N?

Open discussion regarding features, bugs, issues, vendors, etc.

LINPACK - why should LDA > N?

Postby adrian88 » Tue Mar 07, 2017 11:46 pm

In reviewing the extended help from Intel, I have a question.

Specifically, leading dimension of array: The documentation says:

"The leading dimension must be no less than the number of equations. Experience has shown that the best performance for a given problem size is obtained when the leading dimension is set to the nearest odd multiple of 8 (16 for Intel(R) Itanium(R) 2 processors) equal to or larger than the number of equations (divisible by 8 but not by 16, or divisible by 16 but not 32 for Intel(R) Itanium(R) 2 processors)."

Why is that the case for best performance? Do you have any details as to how that is connected with cache line size? Trying to understand how having a LDA = N+8 would be beneficial.

Thanks!!
adrian88
 
Posts: 1
Joined: Tue Mar 07, 2017 11:43 pm

Re: LINPACK - why should LDA > N?

Postby Julien Langou » Thu Mar 09, 2017 9:51 am

Hi Adrian.

( Your question is relevant to LAPACK and LINPACK and BLAS and `anything` using matrices with a leading dimension argument. )

If the question is `why should LDA > N?` then the first answer is `because we want to work on submatrices.` So for example if A is a 20-by-20 matrix and we want to work on the submatrix A(3:5,5:10), we will use M=3, N=6, PTR=&(A(3,5)), LDA=20 to describe the submatrix. If this is not clear, please let me know and I can explain more. All this to say that LDA was initially created to handle submatrices. And so the first answer to `why should LDA > N?` is `because we want to work on submatrices.`

Now, `why should sometimes LDA > N for performance reasons?`. Yes this has to do with cache lines. I am not so much an expert of all this, but for example if N=64, it might make sense sometimes for performance reason to initialize A with LDA=65. This has to do with cache lines. The goal is to have the matrix A on as many cache lines as possible; as opposed to having A on just a few. If A is only on a few cache lines then each time you load elements, you are more likely to erase the cache lines over and over again. If someone wants to explain more, please go ahead.

See graph below for ZGGEV. I forgot the architecture but it was a while back. (Like 10 - 15 years ago.) We have LDA=N. You clearly see that when N is a multiple of 2 then time is much larger than when not. A remedy is when N is power of 2, then take LDA=N+1. (This remedy is not shown on the curve.) This curve is an example among many.

Cheers,
Julien

Untitled.png
time for ZGGEV (I forgot the machine architecture)
Untitled.png (241.49 KiB) Viewed 219 times
Julien Langou
 
Posts: 821
Joined: Thu Dec 09, 2004 12:32 pm
Location: Denver, CO, USA


Return to User Discussion

Who is online

Users browsing this forum: Bing [Bot] and 5 guests