testing examples: why is ldda a multiple of 32?
testing examples: why is ldda a multiple of 32?
I noticed that for some routines like zgemm and zgesv the LDA dimensions are set to be multiples of 32. Is this preferred for some reason?
-
- Posts: 266
- Joined: Fri Aug 21, 2009 10:39 pm
Re: testing examples: why is ldda a multiple of 32?
Calling cudaMalloc properly aligns the beginning of the floating point data allocated for fully coalescent accesses (at the beginning of the data). The starting address for the cards before Fermi had to be aligned at 16*sizeof(type). In order for this to hold for columns after the first, one has to ensure that the lda is divisible by 16. We make this requirement a little stronger (divisibility by 32) in anticipation of hardware changes that may require it.
Stan
Stan