Yes, our kernels are "auto-tuned" (not totally automatic yet). Some of the
our work on auto-tuning MAGMA BLAS kernels is described in
Li, Y., Dongarra, J., Tomov, S. "
A note on auto-tuning GEMM for GPUs,"
Proc. of ICCS'09, Baton Rouge, LA, UT-CS-09-635, May 25-27, 2009.
Nath, R., Tomov, S., Dongarra, J. "
Accelerating GPU Kernels for Dense Linear Algebra,"
Proc. of VECPAR'10, Berkeley, CA, June 22-25, 2010.
We released the
sources for our new SGEMM and DGEMMs for Fermi. The algorithms had
to be extended so the search space for the "automatic" search for new algorithms was also
expanded (e.g., this is how the sgemm was derived from the dgemm implementation).
Note that these kernels can be used to speedup most of the other Lavel 3 BLAS by expressing
them in terms of GEMMs, or third party efforts to develop auto-tuned BLAS.
Stan