Scheduler Specific

  • Polish synchronization

    • Make sure that locking is not excessive

    • Make sure that variables are not volatile if not necessary

  • Find the right criteria for renaming

    • Performance close to out-of-place for codes with anti-dependencies

    • No performance impact for codes without anti-dependencies

  • Look into the performance of loop reversal (ACCUMULATOR)

  • Implement a work stealing policy which does not significantly deteriorate data reuse

  • Consider lock-less implementations of data structures (hash, list)


  • Implement new functionality using the dynamic scheduler

    • "communication avoiding" QR/LQ factorization

    • matrix inversion using Cholesky, LU and QR

Short Term

  • GPU support

    • development of CUDA kernels at the granularity of thread block per tile

    • integration of GPU kernels with the dynamic scheduler

  • Modular and extendable system for testing, timing and autotuning

    • simple template for adding a new routine

    • small number of executables (ideally one)

    • precision generation for multiple precisions

Long Term

  • Mechanism for multithreaded library interoperability

    • sharing of threads between, e.g., PLASMA and multithreaded BLAS

  • Development of a front-end for coding linear algebra algorithms

    • SMPSs-like extensions for annotating tasks and simplifying task queuing

    • extensions for easily accessing elements of matrices stored in tile format

  • Development of SVD and Eigenvalue routines