Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)
I noticed that for some routines like zgemm and zgesv the LDA dimensions are set to be multiples of 32. Is this preferred for some reason?
- Posts: 7
- Joined: Thu Jul 21, 2011 4:26 am
Calling cudaMalloc properly aligns the beginning of the floating point data allocated for fully coalescent accesses (at the beginning of the data). The starting address for the cards before Fermi had to be aligned at 16*sizeof(type). In order for this to hold for columns after the first, one has to ensure that the lda is divisible by 16. We make this requirement a little stronger (divisibility by 32) in anticipation of hardware changes that may require it.
- Posts: 254
- Joined: Fri Aug 21, 2009 10:39 pm
Return to User discussion
Who is online
Users browsing this forum: Baidu [Spider], Bing [Bot] and 2 guests