Page 1 of 1
testing examples: why is ldda a multiple of 32?
Posted: Thu Jul 21, 2011 2:00 pm
I noticed that for some routines like zgemm and zgesv the LDA dimensions are set to be multiples of 32. Is this preferred for some reason?
Re: testing examples: why is ldda a multiple of 32?
Posted: Mon Jul 25, 2011 12:02 pm
Calling cudaMalloc properly aligns the beginning of the floating point data allocated for fully coalescent accesses (at the beginning of the data). The starting address for the cards before Fermi had to be aligned at 16*sizeof(type). In order for this to hold for columns after the first, one has to ensure that the lda is divisible by 16. We make this requirement a little stronger (divisibility by 32) in anticipation of hardware changes that may require it.