The reference BLAS (which may or may not be how your BLAS is implemented)
sometimes check for zero entries in the inputs to avoid unnecessary arithmetic
(look at sger.f). I could imagine that someone optimizing the BLAS might
do this for other BLAS implementations as well. For example, it only costs
O(n^2) to count the number of zeros in an n-by-n matrix, which is cheap
compared to the O(n^3) cost of matrix multiplication, and if the number of
zeros is large enough, they might use a "sparse matrix multiply" algorithm.
Interestingly, this means that the BLAS do not propagate exceptions
consistently, eg some might multiply 0*inf and get a NaN, and others
may skip it.
Users browsing this forum: No registered users and 1 guest