Ichitaro Yamazaki, Tingxing Dong, Stanimire Tomov, and Jack Dongarra "Tridiagonalization of a symmetric dense matrix on a GPU cluster," in the proceedings of the third international workshop on accelerators and hybrid exascale systems (AsHES), May 20, 2013.
Weaver, V., Terpstra, D., Moore, S. "Non-Determinism and Overcount on Modern Hardware Performance Counter Implementations," 2013 IEEE International Symposium on Performance Analysis of Systems and Software, Austin, TX, April 21-23, 2013.
Weaver, V., Terpstra, D., McCraw, H., Johnson, M., Kasichayanula, K., Ralph, J., Nelson, J., Mucci, P., Mohan, T., Moore, S. "PAPI 5: Measuring Power, Energy, and the Cloud," Poster Abstract, 2013 IEEE International Symposium on Performance Analysis of Systems and Software, Austin, TX, April 21-23, 2013.
Chongxiao, C., Dongarra, J., Du, P., Gates, M., Luszczek, P., Tomov, S. "clMAGMA: High Performance Dense Linear Algebra with OpenCL," University of Tennessee Computer Science Technical Report (Lawn 275), UT-CS-13-706, March, 2013.
Bouteiller, A., Cappello, F., Dongarra, J., Guermouche, A., Herault, T., and Robert, Y. "Multi-criteria checkpointing strategies: optimizing response-time versus resource utilization," University of Tennessee Computer Science Technical Report, ICL-UT-13-01, February 15, 2013.
Baboulin, M., Dongarra, J., Herrmann, J., Tomov, S. "Accelerating linear system solutions using randomization techniques," ACM Transactions on Mathematical Software (TOMS), Vol. 39, No 2, February, 2013.
Dongarra, J., Herault, T., Robert, Y. "Revisiting the Double Checkpointing Algorithm," University of Tennessee Computer Science Technical Report (LAWN 274), ut-cs-13-705, January 3, 2013.
Ma, T., Bosilca, G., Bouteiller, A., Dongarra, J. "Kernel-assisted and topology-aware MPI collective communications on multi-core/many-core platforms," Journal of Parallel and Distributed Computing, accepted, January, 2013.
Yamazaki, I., Becker, D., Dongarra, J., Druinsky, A., Peled, I., Toledo, S., Ballard, G., Demmel, J., Schwartz, O. "Implementing a Blocked Aasen’s Algorithm with a Dynamic Scheduler on Multicore Architectures," IPDPS 2013 (submitted), Boston, MA, October, 2012.
Kurzak, J., Luszczek, P., YarKhan, A., Faverge, M., Langou, J., Bouwmeester, H., Dongarra, J. "Multithreading in the PLASMA Library," Multi and Many-Core Processing: Architecture, Programming, Algorithms, & Applications, Ahmed, M., Ammar, R., Rajasekaran, S. eds. Taylor & Francis, 2013.