I moved to KAUST Supercomputing Laboratory, Saudi Arabia where I am currently holding a Computational Scientist position.

PLASMA, FT-LA, MAGMA

Dongarra, J., Faverge, M., Ltaief, H., Luszczek, P.**"Achieving Numerical Accuracy and High Performance using Recursive Tile LU Factorization,"** *Concurrency and Computation: Practice and Experience*, Wiley eds. Wiley, Vol. 26, No. 7, pp. 1408-1431, May, 2014 [pdf] [bibtex]

Ltaief, H., Luszczek, P., Dongarra, J.**"High Performance Bidiagonal Reduction using Tile Algorithms on Homogeneous Multicore Architectures,"** *ACM Transactions on Mathematical Software (TOMS)*, Vol. 39, No. 3, April, 2013 [pdf] [bibtex]

Dongarra, J., Ltaief, H., Luszczek, P., Weaver, V.**"Energy Footprint of Advanced Dense Numerical Linear Algebra using Tile Algorithms on Multicore Architecture,"** *The 2nd International Conference on Cloud and Green Computing (submitted)*, Xiangtan, Hunan, China, November, 2012 [pdf] [bibtex]

Agullo, E., Bosilca, G., Castagnède, C., Dongarra, J., Ltaief, H., Tomov, S.**"Matrices Over Runtime Systems at Exascale,"** *Supercomputing '12 (poster)*, Salt Lake City, Utah, November, 2012 [bibtex]

Ltaief, H., Luszczek, P., Dongarra, J.**"Enhancing Parallelism of Tile Bidiagonal Transformation on Multicore Architectures using Tree Reduction,"** *Lecture Notes in Computer Science*, Vol. 7203, pp. 661-670, September, 2012 [pdf] [bibtex]

Bosilca, G., Dongarra, J., Ltaief, H.**"Power Profiling of Cholesky and QR Factorizations on Distributed Memory Systems,"** *Third International Conference on Energy-Aware High Performance Computing*, Hamburg, Germany, September, 2012 [pdf] [bibtex]

Abdelfattah, A., Dongarra, J., Keyes, D., Ltaief, H.**"Optimizing Memory-Bound Numerical Kernels on GPU Hardware Accelerators,"** *VECPAR 2012*, Kobe, Japan, July, 2012 [pdf] [bibtex]

Haidar, A., Ltaief, H., Dongarra, J.**"Toward High Performance Divide and Conquer Eigensolver for Dense Symmetric Matrices,"** *SIAM Journal on Scientific Computing (Accepted)*, July, 2012 [bibtex]

Haidar, A., Ltaief, H., Luszczek, P., Dongarra, J.**"A Comprehensive Study of Task Coalescing for Selecting Parallelism Granularity in a Two-Stage Bidiagonal Reduction,"** *IPDPS 2012*, Shanghai, China, May, 2012 [pdf] [bibtex]

Agullo, E., Augonnet, C., Dongarra, J., Faverge, M., Langou, J., Ltaief, H., Tomov, S.**"LU Factorization for Accelerator-based Systems,"** *IEEE/ACS AICCSA 2011*, Sharm-El-Sheikh, Egypt, December, 2011 [pdf] [bibtex]

Haidar, A., Ltaief, H., Dongarra, J.**"Parallel Reduction to Condensed Forms for Symmetric Eigenvalue Problems using Aggregated Fine-Grained and Memory-Aware Kernels,"** *Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC11)*, Seattle, WA, November 14, 2011 [pdf] [bibtex]

Dongarra, J., Faverge, M., Ltaief, H., Luszczek, P.**"High Performance Matrix Inversion Based on LU Factorization for Multicore Architectures,"** *Proceedings of MTAGS11*, Seattle, WA, November, 2011 [pdf] [bibtex]

Ltaief, H., Luszczek, P., Dongarra, J.**"Profiling High Performance Dense Linear Algebra Algorithms on Multicore Architectures for Power and Energy Efficiency,"** *International Conference on Energy-Aware High Performance Computing (EnA-HPC 2011)*, Hamburg, Germany, September 7-9, 2011 [pdf] [bibtex]

Dongarra, J., Faverge, M., Ltaief, H., Luszczek, P.**"Achieving Numerical Accuracy and High Performance using Recursive Tile LU Factorization,"** *University of Tennessee Computer Science Technical Report (also as a LAWN)*, ICL-UT-11-08, September, 2011 [pdf] [bibtex]

Haidar, A., Ltaief, H., Dongarra, J.**"Parallel Reduction to Condensed Forms for Symmetric Eigenvalue Problems using Aggregated Fine-Grained and Memory-Aware Kernels,"** *University of Tennessee Computer Science Technical Report, UT-CS-11-677, (also Lawn254)*, August 5, 2011 [pdf] [bibtex]

Ltaief, H., Luszczek, P., Dongarra, J.**"High Performance Bidiagonal Reduction using Tile Algorithms on Homogeneous Multicore Architectures,"** *University of Tennessee Computer Science Technical Report, UT-CS-11-673, (also Lawn 247)*, May 18, 2011 [pdf] [bibtex]

Luszczek, P., Ltaief, H., Dongarra, J.**"Two-stage Tridiagonal Reduction for Dense Symmetric Matrices using Tile Algorithms on Multicore Architectures,"** *IEEE International Parallel and Distributed Processing Symposium (submitted)*, Anchorage, AK, May 16-20, 2011 [bibtex]

Dongarra, J., Faverge, M., Ltaief, H., Luszczek, P.**"Exploiting Fine-Grain Parallelism in Recursive LU Factorization,"** *Proceedings of PARCO'11*, Gent, Belgium, ICL-UT-11-04, April, 2011 [bibtex]

Haidar, A., Ltaief, H., YarKhan, A., Dongarra, J.**"Analysis of Dynamically Scheduled Tile Algorithms for Dense Linear Algebra on Multicore Architectures,"** *University of Tennessee Computer Science Technical Report, UT-CS-11-666, (also Lawn 243)*, March 10, 2011 [bibtex]

Haidar, A., Ltaief, H., Dongarra, J.**"Toward High Performance Divide and Conquer Eigensolver for Dense Symmetric Matrices.,"** *Submitted to SIAM Journal on Scientific Computing (SISC)*, 2011 [bibtex]

Agullo, E., Augonnet, C., Dongarra, J., Ltaief, H., Namyst, R., Thibault, S., Tomov, S.**"A Hybridization Methodology for High-Performance Linear Algebra Software for GPUs,"** *in GPU Computing Gems, Jade Edition*, Hwu, W. eds. Elsevier, 2, 473-484, 2011 [bibtex]

Song, F., Ltaief, H., Hadri, B., Dongarra, J.**"Scalable Tile Communication-Avoiding QR Factorization on Multicore Cluster Systems,"** *SC'10*, ACM SIGARCH/ IEEE Computer Society, New Orleans, LA, November 13-19, 2010 [pdf] [bibtex]

Haidar, A., Ltaief, H., YarKhan, A., Dongarra, J.**"Analysis of Dynamically Scheduled Tile Algorithms for Dense Linear Algebra on Multicore Architectures,"** *Submitted to Concurrency and Computations: Practice and Experience*, November 3, 2010 [pdf] [bibtex]

Agullo, E., Augonnet, C., Dongarra, J., Faverge, M., Ltaief, H., Thibault, S., Tomov, S.**"QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators,"** *Proceedings of IPDPS 2011*, Anchorage, AK, ICL-UT-10-04, October 1, 2010 [pdf] [bibtex]

Bosilca, G., Bouteiller, A., Danalis, A., Faverge, M., Haidar, H., Herault, T., Kurzak, J., Langou, J., Lemariner, P., Ltaief, H., Luszczek, P., YarKhan, A., Dongarra, J.**"Distributed Dense Numerical Linear Algebra Algorithms on Massively Parallel Architectures: DPLASMA,"** *University of Tennessee Computer Science Technical Report, UT-CS-10-660*, Sept. 15, 2010 [pdf] [bibtex]

Ltaief, H., Tomov, S., Nath, R., Du, P., Dongarra, J.**"A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators,"** *Proc. of VECPAR'10 (to appear)*, Berkeley, CA, June 22-25, 2010 [pdf] [bibtex]

Song, F., Ltaief, H., Hadri, B., Dongarra, J.**"Scalable Tile Communication-Avoiding QR Factorization on Multicore Cluster Systems,"** *University of Tennessee Computer Science Technical Report*, UT-CS-10-653, April, 2010 [pdf] [bibtex]

Ltaief, H., Kurzak, J., Dongarra, J.**"Parallel Band Two-Sided Matrix Bidiagonalization for Multicore Architectures,"** *IEEE Transactions on Parallel and Distributed Systems*, pp. 417-423, April, 2010 [pdf] [bibtex]

Ltaief, H., Tomov, S., Nath, R., Dongarra, J.**"Hybrid Multicore Cholesky Factorization with Multiple GPU Accelerators,"** *IEEE Transaction on Parallel and Distributed Systems (submitted)*, March 26, 2010 [pdf] [bibtex]

Tomov, S., Nath, R., Ltaief, H., Dongarra, J.**"Dense Linear Algebra Solvers for Multicore with GPU Accelerators,"** *Proc. of IPDPS'10*, Atlanta, GA, January 15, 2010 [pdf] [bibtex]

Kurzak, J., Ltaief, H., Dongarra, J., Badia, R.**"Scheduling Dense Linear Algebra Operations on Multicore Processors,"** *Concurrency and Computation: Practice and Experience*, Vol. 22, no. 1, pp. 15-44, January, 2010 [pdf] [bibtex]

Ltaief, H., Kurzak, J., Dongarra, J., M. Badia, R.**"Scheduling Two-sided Transformations using Tile Algorithms on Multicore Architectures,"** *Journal of Scientific Computing*, Vol. 18, No. 1, pp. 33-50, 2010 [pdf] [bibtex]

Bosilca, G., Bouteiller, A., Danalis, A, Faverge, M., Haidar, A., Herault, T., Kurzak, J., Langou, J., Lemarinier, P., Ltaief, H., Luszczek, P., YarKhan, A., Dongarra, J.**"Distributed-Memory Task Execution and Dependence Tracking within DAGuE and the DPLASMA Project,"** *Innovative Computing Laboratory Technical Report*, ICL-UT-10-02, 2010 [pdf] [bibtex]

Agullo, E., Augonnet, C., Dongarra, J., Ltaief, H., Namyst, R., Thibault, S., and Tomov, S.**"Faster, Cheaper, Better - a Hybridization Methodology to Develop Linear Algebra Software for GPUs,"** *LAPACK Working Note 230*, 2010 [pdf] [bibtex]

Hadri, B., Ltaief, H., Agullo, E., Dongarra, J.**"Enhancing Parallelism of Tile QR Factorization for Multicore Architectures,"** *Submitted to Transaction on Parallel and Distributed Systems*, December, 2009 [pdf] [bibtex]

Hadri, B., Ltaief, H., Agullo, E., Dongarra, J.**"Tile QR Factorization with Parallel Panel Processing for Multicore Architectures,"** *accepted in 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2010)*, Atlanta, GA, December, 2009 [pdf] [bibtex]

Hadri, B., Ltaief, H., Agullo, E., Dongarra, J.**"Tall and Skinny QR Matrix Factorization Using Tile Algorithms on Multicore Architectures,"** *Innovative Computing Laboratory Technical Report (also LAPACK Working Note 222 and CS Tech Report UT-CS-09-645)*, ICL-UT-09-03, September 4, 2009 [pdf] [bibtex]

Kurzak, J., Ltaief, H., Dongarra, J., Badia, R.**"Dependency-Driven Scheduling of Dense Matrix Factorizations on Shared-Memory Systems,"** *PPAM 2009*, Poland, September, 2009 [bibtex]

Ltaief, H., Kurzak, J., Dongarra, J.**"Parallel Band Two-Sided Matrix Bidiagonalization for Multicore Architectures,"** *IEEE Transactions on Parallel and Distributed Systems (to appear)*, May, 2009 [pdf] [bibtex]

Agullo, E., Demmel, J., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Ltaief, H., Luszczek, P., Tomov, S.**"Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects,"** *Journal of Physics: Conference Series*, Vol. 180, 2009 [pdf] [bibtex]

Agullo, E., Hadri, B., Ltaief, H., Dongarra, J.**"Comparative Study of One-Sided Factorizations with Multiple Software Packages on Multi-Core Hardware,"** *2009 International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '09) (to appear)*, 2009 [pdf] [bibtex]

Kurzak, J., Ltaief, H., Dongarra, J., Badia, R.**"Scheduling Linear Algebra Operations on Multicore Processors,"** *University of Tennessee Computer Science Department Technical Report, UT-CS-09-636 (Also LAPACK Working Note 213)*, 2009 [pdf] [bibtex]

Kurzak, J., Ltaief, H., Dongarra, J., Badia, R.**"Scheduling Linear Algebra Operations on Multicore Processors,"** *Concurrency Practice and Experience (to appear)*, 2009 [bibtex]

Ltaief, H., Kurzak, J., Dongarra., J.**"Parallel Block Hessenberg Reduction using Algorithms-By-Tiles for Multicore Architectures Revisited,"** *University of Tennessee Computer Science Technical Report, UT-CS-08-624 (also LAPACK Working Note 208)*, August 7, 2008 [pdf] [bibtex]

GPU Technology Conference (GTC 2010)

2010-09-20 San Jose, CA

VECPAR

2010-06-22 Berkeley, CA

CUDA Center of Excellence 2010

2010-06-12 Beijing, China

Hybrid Multicore Consortium, First Annual Workshop

2010-01-20 San Francisco, CA

Supercomputing 2009

2009-11-14 Portland, OR

Workshop on Resiliency for Petascale HPC

2009-10-13 Santa Fe, NM

Scientific Discovery through Advanced Computing MEETING

2009-06-14 San Diego, CA

Parallel and Computational Fluid Dynamics

2008-05-19 Lyon, France

Email

Phone 865-974-9985

Office Claxton

University of Tennessee

Computer Science Department

Innovative Computing Laboratory

1122 Volunteer Blvd, Claxton Building

Knoxville, Tennessee 37996-3450

Fax 865-974-8296

