George Bosilca
Innovative Computing
Laboratory, University of Tennessee
(865) 974-6321 email: bosilca@eecs.utk.edu
Education and Training:
|
University
of Paris XI Orsay, France
|
Math
and Computer Science
|
B.S.
1999
|
|
University
of Paris XI Orsay, France
|
Computer
Science
|
Ph.D.
2003
|
|
University
of Tennessee, ICL
|
Parallel
Computing
|
Post
Doc 2004-2005
|
Research and Professional
Experience:
Research Asst. Professor, Innovative Computing Laboratory,
University of Tennessee (2007-)
Adjunct Assistant Professor, University of Tennessee (2004
– present)
Research Scientist, Innovative Computing Laboratory,
University of Tennessee (2005 - 2007)
Sr. Research Assoc., Innovative Computing Laboratory, University
of Tennessee (2004 – 2005)
Research Assoc., Innovative Computing Laboratory, University
of Tennessee (2003 – 2004)
Synergistic
Activities:
á
Technical lead, release manager and active
member of the Open MPI development team.
á
Active member of the MPI Forum.
á
Technical lead for the AtomS,
System Noise and STCI Software Packages; Technical lead for the Fault Tolerant
FT-MPI Library Development; and Technical lead for the MPICH-V
á
Architect and Technical Lead for DAGuE / DPLASMA.
Collaborators and Co-editors:
Emmanuel Agullo (INRIA, France), Brad
Benton (IBM), Franck Cappello (INRIA Futur, France), Ralph Castain
(LANL), D. Cronk (University of Tennessee), J. Dongarra (University of Tennessee), Terry Dontje (SUN/Oracle), G. Fagg
(Microsoft), Patrick Geoffray (Myrinet), Brice Goglin (INRIA, France), Rich Graham (ORNL), Thomas Herault (INRIA Futur, France),
Yutaka Ishikawa (University of Tokyo), Emmanuel Jeanot
(INRIA, France), Andrew Lumsdaine (University of
Indiana), Christine Morin (INRIA, France), Yves Robert (ENS, Lyon, France), Jeff
Squyres (CISCO), Stan Tomov
(University of Tennessee)
Graduate and Postdoctoral
Advisors and Advisees
Graduate Students (past 5 years):
Daniel Andrzejewski, Thara Angskun, Wesley Bland, Kartheek V. Bodanki, Camille Coti, Jelena Pjesivac–Grbovic, Kusolchu Krerkchai, Narapat Saengpatsa, Gwang Son, Teng Ma, Wei Wu, Anthony Canino,
Peter Gaultney, Peng Du,
Postdoctoral Associates (past 5 years):
Stephanie Moreaud, Anthony Danalis, Aurelien Bouteiller, Pierre Lemarinier, Yuan
Tang
Thesis Advisor:
Dr. Franck Cappello, INRIA Futur, University of Paris XI Orsay
and INRIA-Illinois Joint Laboratory on PetaScale
Computing.
Publications:
Bland, W., Bosilca, G., Bouteiller, A., Herault, T., Dongarra, J. "A Proposal for User-Level Failure Mitigation in the MPI-3 Standard," University of Tennessee Electrical Engineering and Computer Science Technical Report, ut-cs-12-693, February 24, 2012 [pdf] [bibtex] @{techreport}{icl:{667},
author = {Bland, W., Bosilca, G., Bouteiller, A., Herault, T., Dongarra, J.},
title = {A Proposal for User-Level Failure Mitigation in the MPI-3 Standard},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {University of Tennessee Electrical Engineering and Computer Science Technical Report},
} [
hide]
Ma, T., Bouteiller, A., Bosilca, G., Dongarra, J. "Impact of Kernel-Assisted MPI Communication over Scientific Applications: CPMD and FFTW," 18th EuroMPI, Cotronis, Y., Danalis, A., Nikolopoulos, D., Dongarra, J. eds. Springer, Santorini, Greece, pp. 247-254, September, 2011 [bibtex] @{article}{icl:{646},
author = {Ma, T., Bouteiller, A., Bosilca, G., Dongarra, J.},
title = {Impact of Kernel-Assisted MPI Communication over Scientific Applications: CPMD and FFTW},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {18th EuroMPI},
pages = {pp. 247-254},
} [
hide]
Chaarawi, M., Gabriel, E., Keller, R., Graham, R., Bosilca, G., Dongarra, J. "OMPIO: A Modular Software Architecture for MPI I/O," 18th EuroMPI, Cotronis, Y., Danalis, A., Nikolopoulos, D., Dongarra, J. eds. Springer, Santorini, Greece, pp. 81-89, September, 2011 [bibtex] @{article}{icl:{647},
author = {Chaarawi, M., Gabriel, E., Keller, R., Graham, R., Bosilca, G., Dongarra, J.},
title = {OMPIO: A Modular Software Architecture for MPI I/O},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {18th EuroMPI},
pages = {pp. 81-89},
} [
hide]
Ma, T., Bosilca, G., Bouteiller, A., Goglin, B., Squyres, J., Dongarra, J. "Kernel Assisted Collective Intra-node MPI Communication Among Multi-core and Many-core CPUs," Int'l Conference on Parallel Processing (ICPP '11), Taipei, Taiwan, September, 2011 [bibtex] @{inproceedings}{icl:{649},
author = {Ma, T., Bosilca, G., Bouteiller, A., Goglin, B., Squyres, J., Dongarra, J.},
title = {Kernel Assisted Collective Intra-node MPI Communication Among Multi-core and Many-core CPUs},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {Int'l Conference on Parallel Processing (ICPP '11)},
} [
hide]
Du, P., Bouteiller, A., Bosilca, G., Herault, T., Dongarra, J. "Algorithm-based Fault Tolerance for Dense Matrix Factorizations," University of Tennessee Computer Science Technical Report, Knoxville, TN, UT-CS-11-676, August 05, 2011 [pdf] [bibtex] @{techreport}{icl:{626},
author = {Du, P., Bouteiller, A., Bosilca, G., Herault, T., Dongarra, J.},
title = {Algorithm-based Fault Tolerance for Dense Matrix Factorizations},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {University of Tennessee Computer Science Technical Report},
} [
hide]
Bosilca, G., Bouteiller, A., Herault, T., Lemarier, P., Saengpatsa, N., Tomov, S., Dongarra, J. "Performance Portability of a GPU Enabled Factorization with the DAGuE Framework," IEEE Cluster: workshop on Parallel Programming on Accelerator Clusters (PPAC), June 24, 2011 [pdf] [bibtex] @{inproceedings}{icl:{636},
author = {Bosilca, G., Bouteiller, A., Herault, T., Lemarier, P., Saengpatsa, N., Tomov, S., Dongarra, J.},
title = {Performance Portability of a GPU Enabled Factorization with the DAGuE Framework},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {IEEE Cluster: workshop on Parallel Programming on Accelerator Clusters (PPAC)},
} [
hide]
Bosilca, G., Bouteiller, A., Herault, T., Lemarinier, P., Saengpatsa, N., Tomov, S., Dongarra, J. "A Unified HPC Environment for Hybrid Manycore/GPU Distributed Systems," IEEE International Parallel and Distributed Processing Symposium (submitted), Anchorage, AK, May 16-20, 2011 [bibtex] @{inproceedings}{icl:{593},
author = {Bosilca, G., Bouteiller, A., Herault, T., Lemarinier, P., Saengpatsa, N., Tomov, S., Dongarra, J.},
title = {A Unified HPC Environment for Hybrid Manycore/GPU Distributed Systems},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {IEEE International Parallel and Distributed Processing Symposium (submitted)},
} [
hide]
Bosilca, G., Herault, T., Rezmerita, A., Dongarra, J. "On Scalability for MPI Runtime Systems," University of Tennessee Computer Science Technical Report, Knoxville, TN, ICL-UT-11-05, May 1, 2011 [pdf] [bibtex] @{techreport}{icl:{612},
author = {Bosilca, G., Herault, T., Rezmerita, A., Dongarra, J.},
title = {On Scalability for MPI Runtime Systems},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {University of Tennessee Computer Science Technical Report},
} [
hide]
Ma, T., Herault, T., Bosilca, G., Dongarra, J. "Process Distance-aware Adaptive MPI Collective Communications," IEEE Int'l Conference on Cluster Computing (Cluster 2011), Austin, Texas, September, 2011 [bibtex] @{inproceedings}{icl:{648},
author = {Ma, T., Herault, T., Bosilca, G., Dongarra, J.},
title = {Process Distance-aware Adaptive MPI Collective Communications},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {IEEE Int'l Conference on Cluster Computing (Cluster 2011)},
} [
hide]
Ma, T., Bosilca, G., Bouteiller, A., Goglin, B., Squyres, J., Dongarra, J. "Kernel Assisted Collective Intra-node Communication Among Multicore and Manycore CPUs," University of Tennessee Computer Science Technical Report, UT-CS-10-663, November, 2010 [pdf] [bibtex] @{techreport}{icl:{597},
author = {Ma, T., Bosilca, G., Bouteiller, A., Goglin, B., Squyres, J., Dongarra, J.},
title = {Kernel Assisted Collective Intra-node Communication Among Multicore and Manycore CPUs},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {University of Tennessee Computer Science Technical Report, UT-CS-10-663},
} [
hide]
Bosilca, G., Bouteiller, A., Danalis, A., Faverge, M., Haidar, H., Herault, T., Kurzak, J., Langou, J., Lemariner, P., Ltaief, H., Luszczek, P., YarKhan, A., Dongarra, J. "Distributed Dense Numerical Linear Algebra Algorithms on Massively Parallel Architectures: DPLASMA," University of Tennessee Computer Science Technical Report, UT-CS-10-660, Sept. 15, 2010 [pdf] [bibtex] @{techreport}{icl:{563},
author = {Bosilca, G., Bouteiller, A., Danalis, A., Faverge, M., Haidar, H., Herault, T., Kurzak, J., Langou, J., Lemariner, P., Ltaief, H., Luszczek, P., YarKhan, A., Dongarra, J.},
title = {Distributed Dense Numerical Linear Algebra Algorithms on Massively Parallel Architectures: DPLASMA},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {University of Tennessee Computer Science Technical Report, UT-CS-10-660},
} [
hide]
Bosilca, G., Bouteiller, A., Herault, T., Lemarinier, P., Dongarra, J. "Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols," Proceedings of EuroMPI 2010, Jack Dongarra, Michael Resch, Rainer Keller, Edgar Gabriel, eds. eds. Springer, Stuttgart, Germany, September, 2010 [pdf] [bibtex] @{inproceedings}{icl:{534},
author = {Bosilca, G., Bouteiller, A., Herault, T., Lemarinier, P., Dongarra, J.},
title = {Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {Proceedings of EuroMPI 2010},
} [
hide]
Bouteiller, A., Bosilca, G., Dongarra, J. "Redesigning the Message Logging Model for High Performance," Concurrency and Computation: Practice and Experience (online version), June 27, 2010 [pdf] [bibtex] @{article}{icl:{565},
author = {Bouteiller, A., Bosilca, G., Dongarra, J.},
title = {Redesigning the Message Logging Model for High Performance},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {Concurrency and Computation: Practice and Experience (online version)},
} [
hide]
Turchenko, V., Grandinetti, L., Bosilca, G., Dongarra, J. "Improvement of parallelization efficiency of batch pattern BP training algorithm using Open MPI," Proceedings of International Conference on Computational Science, ICCS 2010 (to appear), Elsevier, Amsterdam The Netherlands, June, 2010 [pdf] [bibtex] @{inproceedings}{icl:{527},
author = {Turchenko, V., Grandinetti, L., Bosilca, G., Dongarra, J.},
title = {Improvement of parallelization efficiency of batch pattern BP training algorithm using Open MPI},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {Proceedings of International Conference on Computational Science, ICCS 2010 (to appear)},
} [
hide]
Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Lemarinier, P., Dongarra, J. "DAGuE: A generic distributed DAG engine for high performance computing," Innovative Computing Laboratory Technical Report, ICL-UT-10-01, April 11, 2010 [pdf] [bibtex] @{techreport}{icl:{528},
author = {Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Lemarinier, P., Dongarra, J.},
title = {DAGuE: A generic distributed DAG engine for high performance computing},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {Innovative Computing Laboratory Technical Report},
} [
hide]
Angskun, T., Fagg, G., Bosilca, G., Pjesivac-Grbovic, J., Dongarra, J. "Self-Healing Network for Scalable Fault-Tolerant Runtime Environments," Future Generation Computer Systems, Vol. 26, Number 3, pp. 479-485, March, 2010 [pdf] [bibtex] @{article}{icl:{567},
author = {Angskun, T., Fagg, G., Bosilca, G., Pjesivac-Grbovic, J., Dongarra, J.},
title = {Self-Healing Network for Scalable Fault-Tolerant Runtime Environments},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {Future Generation Computer Systems},
volume = {Vol. 26, Number 3},
pages = {pp. 479-485},
} [
hide]
Bosilca, G., Bouteiller, A., Danalis, A, Faverge, M., Haidar, A., Herault, T., Kurzak, J., Langou, J., Lemarinier, P., Ltaief, H., Luszczek, P., YarKhan, A., Dongarra, J. "Distributed-Memory Task Execution and Dependence Tracking within DAGuE and the DPLASMA Project," Innovative Computing Laboratory Technical Report, ICL-UT-10-02, 2010 [pdf] [bibtex] @{techreport}{icl:{529},
author = {Bosilca, G., Bouteiller, A., Danalis, A, Faverge, M., Haidar, A., Herault, T., Kurzak, J., Langou, J., Lemarinier, P., Ltaief, H., Luszczek, P., YarKhan, A., Dongarra, J.},
title = {Distributed-Memory Task Execution and Dependence Tracking within DAGuE and the DPLASMA Project},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {Innovative Computing Laboratory Technical Report},
} [
hide]
Bosilca, G., Coti, C., Herault, T., Lemarinier, P., Dongarra, J. "Constructing Resiliant Communication Infrastructure for Runtime Environments in Advances in Parallel Computing," in Advances in Parallel Computing - Parallel Computing: From Multicores and GPU's to Petascale, Chapman, B., Desprez, F., Joubert, G., Lichnewsky, A., Peters, F., Priol, T. Eds. eds. Volume 19, pp. 441-451, 2010 [bibtex] @{article}{icl:{555},
author = {Bosilca, G., Coti, C., Herault, T., Lemarinier, P., Dongarra, J.},
title = {Constructing Resiliant Communication Infrastructure for Runtime Environments in Advances in Parallel Computing},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {in Advances in Parallel Computing - Parallel Computing: From Multicores and GPU's to Petascale},
volume = {Volume 19},
pages = {pp. 441-451},
} [
hide]
Lemarinier, P., Bosilca, G., Coti, C., Herault, T., Dongarra, J. "Constructing Resilient Communication Infrastructure for Runtime Environments," ParCo 2009, Lyon France, September, 2009 [bibtex] @{article}{icl:{517},
author = {Lemarinier, P., Bosilca, G., Coti, C., Herault, T., Dongarra, J.},
title = {Constructing Resilient Communication Infrastructure for Runtime Environments},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {ParCo 2009},
} [
hide]
Bosilca, G., Coti, C., Herault, T., Lemarinier, P., Dongarra, J. "Constructing resiliant communication infrastructure for runtime environments," Innovative Computing Laboratory Technical Report, ICL-UT-09-02, July 31, 2009 [pdf] [bibtex] @{techreport}{icl:{484},
author = {Bosilca, G., Coti, C., Herault, T., Lemarinier, P., Dongarra, J.},
title = {Constructing resiliant communication infrastructure for runtime environments},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {Innovative Computing Laboratory Technical Report},
} [
hide]
Dongarra, J., Bosilca, G., Delmas, R., Langou, J. "Algorithmic Based Fault Tolerance Applied to High Performance Computing," Journal of Parallel and Distributed Computing, Volume 69, pp. 410-416, 2009 [pdf] [bibtex] @{article}{icl:{490},
author = {Dongarra, J., Bosilca, G., Delmas, R., Langou, J.},
title = {Algorithmic Based Fault Tolerance Applied to High Performance Computing},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {Journal of Parallel and Distributed Computing},
volume = {Volume 69},
pages = {pp. 410-416},
} [
hide]
Bosilca, G., Delmas, R., Dongarra, J., Langou, J. "Algorithmic Based Fault Tolerance Applied to High Performance Computing," University of Tennessee Computer Science Technical Report, UT-CS-08-620 (also LAPACK Working Note 205), June 19, 2008 [pdf] [bibtex] @{techreport}{icl:{426},
author = {Bosilca, G., Delmas, R., Dongarra, J., Langou, J.},
title = {Algorithmic Based Fault Tolerance Applied to High Performance Computing},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {University of Tennessee Computer Science Technical Report, UT-CS-08-620 (also LAPACK Working Note 205)},
} [
hide]
Bouteiller, A., Bosilca, G., Dongarra, J. "Redesigning the Message Logging Model for High Performance," International Supercomputer Conference (ISC 2008), Dresden, Germany, June 17, 2008 [pdf] [bibtex] @{inproceedings}{icl:{456},
author = {Bouteiller, A., Bosilca, G., Dongarra, J.},
title = {Redesigning the Message Logging Model for High Performance},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {International Supercomputer Conference (ISC 2008)},
} [
hide]
Angskun, T., Bosilca, G., Vander Zanden, B., Dongarra, J. "Optimal Routing in Binomial Graph Networks," The International Conference on Parallel and Distributed Computing, applications and Technologies (PDCAT), IEEE Computer Society, Adelaide, Australia, December 3-6, 2007 [bibtex] @{inproceedings}{icl:{374},
author = {Angskun, T., Bosilca, G., Vander Zanden, B., Dongarra, J.},
title = {Optimal Routing in Binomial Graph Networks},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {The International Conference on Parallel and Distributed Computing, applications and Technologies (PDCAT)},
} [
hide]
Angskun, T., Bosilca, G., Dongarra, J. "Self-Healing in Binomial Graph Networks," 2nd International Workshop On Reliability in Decentralized Distributed Systems (RDDS 2007), Vilamoura, Algarve, Portugal, November, 2007 [pdf] [bibtex] @{inproceedings}{icl:{380},
author = {Angskun, T., Bosilca, G., Dongarra, J.},
title = {Self-Healing in Binomial Graph Networks},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {2nd International Workshop On Reliability in Decentralized Distributed Systems (RDDS 2007)},
} [
hide]
Bouteiller, A., Bosilca, G., Dongarra, J. "Retrospect: Deterministic Relay of MPI Applications for Interactive Distributed Debugging," Accepted for Euro PVM/MPI 2007, Springer, September, 2007 [bibtex] @{article}{icl:{353},
author = {Bouteiller, A., Bosilca, G., Dongarra, J.},
title = {Retrospect: Deterministic Relay of MPI Applications for Interactive Distributed Debugging},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {Accepted for Euro PVM/MPI 2007},
} [
hide]
Graham, R., Brightwell, R., Barrett, B., Bosilca, G., Pjesivac-Grbovic, J. "An Evaluation of Open MPI's Matching Transport Layer on the Cray XT," EuroPVM/MPI 2007, September, 2007 [bibtex] @{article}{icl:{359},
author = {Graham, R., Brightwell, R., Barrett, B., Bosilca, G., Pjesivac-Grbovic, J.},
title = {An Evaluation of Open MPI's Matching Transport Layer on the Cray XT},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {EuroPVM/MPI 2007},
} [
hide]
Angskun, T., Bosilca, G., Dongarra, J. "Binomial Graph: A Scalable and Fault- Tolerant Logical Network Topology," Proceedings of The Fifth International Symposium on Parallel and Distributed Processing and Applications (ISPA07), Springer, Niagara Falls, Canada, August 29-30, 2007 [pdf] [bibtex] @{inproceedings}{icl:{355},
author = {Angskun, T., Bosilca, G., Dongarra, J.},
title = {Binomial Graph: A Scalable and Fault- Tolerant Logical Network Topology},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {Proceedings of The Fifth International Symposium on Parallel and Distributed Processing and Applications (ISPA07)},
} [
hide]
Pjesivac-Grbovic, J., Bosilca, G., Fagg, G., Angskun, T., Dongarra, J. "Decision Trees and MPI Collective Algorithm Selection Problem," Euro-Par 2007, Springer, Rennes, France, 105--115, August, 2007 [pdf] [bibtex] @{article}{icl:{357},
author = {Pjesivac-Grbovic, J., Bosilca, G., Fagg, G., Angskun, T., Dongarra, J.},
title = {Decision Trees and MPI Collective Algorithm Selection Problem},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {Euro-Par 2007},
pages = {105--115},
} [
hide]
Pjesivac-Grbovic, J., Angskun, T., Bosilca, G., Fagg, G., Gabriel, E., Dongarra, J. "Performance Analysis of MPI Collective Operations," Cluster computing, Springer Netherlands, Volume 10, Number 2, 127-143, June, 2007 [pdf] [bibtex] @{article}{icl:{358},
author = {Pjesivac-Grbovic, J., Angskun, T., Bosilca, G., Fagg, G., Gabriel, E., Dongarra, J.},
title = {Performance Analysis of MPI Collective Operations},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {Cluster computing},
volume = {Volume 10, Number 2},
pages = {127-143},
} [
hide]
Angskun, T., Bosilca, G., Fagg, G., Pjesivac-Grbovic, J., Dongarra, J. "Reliability Analysis of Self-Healing Network using Discrete-Event Simulation," Proceedings of Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07), IEEE Computer Society, 437-444, May, 2007 [bibtex] @{inproceedings}{icl:{354},
author = {Angskun, T., Bosilca, G., Fagg, G., Pjesivac-Grbovic, J., Dongarra, J.},
title = {Reliability Analysis of Self-Healing Network using Discrete-Event Simulation},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {Proceedings of Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07)},
pages = {437-444},
} [
hide]
Graham, R., Bosilca, G., Pjesivac-Grbovic, J. "A Comparison of Application Performance Using Open MPI and Cray MPI," Cray User Group, CUG 2007, May, 2007 [pdf] [bibtex] @{article}{icl:{360},
author = {Graham, R., Bosilca, G., Pjesivac-Grbovic, J.},
title = {A Comparison of Application Performance Using Open MPI and Cray MPI},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {Cray User Group, CUG 2007},
} [
hide]
Langou, J., Chen, Z., Bosilca, G., Dongarra, J., "Recovery Patterns for Iterative Methods in a Parallel Unstable Environment," SIAM SISC (to appear), May, 2007 [pdf] [bibtex] @{article}{icl:{397},
author = {Langou, J., Chen, Z., Bosilca, G., Dongarra, J., },
title = {Recovery Patterns for Iterative Methods in a Parallel Unstable Environment},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {SIAM SISC (to appear)},
} [
hide]
Buttari, A., Luszczek, P., Kurzak, J., Dongarra, J., Bosilca, G. "SCOP3: A Rough Guide to Scientific Computing On the PlayStation 3," University of Tennessee Computer Science Dept. Technical Report, UT-CS-07-595, April 17, 2007 [pdf] [bibtex] @{techreport}{icl:{364},
author = {Buttari, A., Luszczek, P., Kurzak, J., Dongarra, J., Bosilca, G.},
title = {SCOP3: A Rough Guide to Scientific Computing On the PlayStation 3},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {University of Tennessee Computer Science Dept. Technical Report, UT-CS-07-595},
} [
hide]
Dongarra, J., Chen, Z., Bosilca, G., Langou, J. "Disaster Survival Guide in Petascale Computing: An Algorithmic Approach," in Petascale Computing: Algorithms and Applications (to appear), Chapman & Hall - CRC Press, 2007 [pdf] [bibtex] @{article}{icl:{366},
author = {Dongarra, J., Chen, Z., Bosilca, G., Langou, J.},
title = {Disaster Survival Guide in Petascale Computing: An Algorithmic Approach},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {in Petascale Computing: Algorithms and Applications (to appear)},
} [
hide]
Pjesivac--Grbovic, J., Bosilca, G., Fagg, G., Angskun, T., Dongarra, J. "MPI Collective Algorithm Selection and Quadtree Encoding," Parallel Computing (Special Edition: EuroPVM/MPI 2006), Elsevier, 2007 [pdf] [bibtex] @{article}{icl:{356},
author = {Pjesivac--Grbovic, J., Bosilca, G., Fagg, G., Angskun, T., Dongarra, J.},
title = {MPI Collective Algorithm Selection and Quadtree Encoding},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {Parallel Computing (Special Edition: EuroPVM/MPI 2006)},
} [
hide]
Pjesivac-Grbovic, J., Fagg, G., Angskun, T., Bosilca, G., Dongarra, J. "MPI Collective Algorithm Selection and Quadtree Encoding," Lecture Notes in Computer Science, Springer Berlin / Heidelberg, ICL-UT-06-13, Vol. 4192, Number 2006, pp. 40-48, September, 2006 [pdf] [bibtex] @{article}{icl:{323},
author = {Pjesivac-Grbovic, J., Fagg, G., Angskun, T., Bosilca, G., Dongarra, J.},
title = {MPI Collective Algorithm Selection and Quadtree Encoding},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {Lecture Notes in Computer Science},
volume = {Vol. 4192, Number 2006},
pages = {pp. 40-48},
} [
hide]
Fagg, G., Pjesivac-Grbovic, J., Bosilca, G., Angskun, T., Dongarra, J. "Flexible collective communication tuning architecture applied to Open MPI," 2006 Euro PVM/MPI (submitted), Bonn, Germany, September, 2006 [pdf] [bibtex] @{article}{icl:{315},
author = {Fagg, G., Pjesivac-Grbovic, J., Bosilca, G., Angskun, T., Dongarra, J.},
title = {Flexible collective communication tuning architecture applied to Open MPI},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {2006 Euro PVM/MPI (submitted)},
} [
hide]
Angskun, T., Fagg, G., Bosilca, G., Pjesivac-Grbovic, J., Dongarra, J. "Self-Healing Network for Scalable Fault Tolerant Runtime Environments," DAPSYS 2006, 6th Austrian-Hungarian Workshop on Distributed and Parallel Systems, Innsbruck, Austria, September 21-23, 2006 [pdf] [bibtex] @{inproceedings}{icl:{330},
author = {Angskun, T., Fagg, G., Bosilca, G., Pjesivac-Grbovic, J., Dongarra, J.},
title = {Self-Healing Network for Scalable Fault Tolerant Runtime Environments},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {DAPSYS 2006, 6th Austrian-Hungarian Workshop on Distributed and Parallel Systems},
} [
hide]
Bosilca, G., Chen, Z., Dongarra, J., Eijkhout, V., Fagg, G., Fuentes, E., Langou, J., Luszczek, P., Pjesivac-Grbovic, J., Seymour, K., You, H., Vadhiyar, S. "Self Adapting Numerical Software SANS Effort," IBM Journal of Research and Development, Volume 50, number 2/3, pp. 223-238, 2006 [pdf] [bibtex] @{article}{icl:{332},
author = {Bosilca, G., Chen, Z., Dongarra, J., Eijkhout, V., Fagg, G., Fuentes, E., Langou, J., Luszczek, P., Pjesivac-Grbovic, J., Seymour, K., You, H., Vadhiyar, S.},
title = {Self Adapting Numerical Software SANS Effort},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {IBM Journal of Research and Development},
volume = {Volume 50, number 2/3},
pages = {pp. 223-238},
} [
hide]
Pjesivac-Grbovic, J., Fagg, G., Angskun, T., Bosilca, G., Dongarra, J. "MPI Collective Algorithm Selection and Quadtree Encoding," ICL Technical Report, ICL-UT-06-11, 2006 [pdf] [bibtex] @{techreport}{icl:{314},
author = {Pjesivac-Grbovic, J., Fagg, G., Angskun, T., Bosilca, G., Dongarra, J.},
title = {MPI Collective Algorithm Selection and Quadtree Encoding},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {ICL Technical Report},
} [
hide]
Angskun, T., Fagg, G., Bosilca, G., Pjesivac-Grbovic, J., Dongarra, J. "Scalable Fault Tolerant Protocol for Parallel Runtime Environments," 2006 Euro PVM/MPI, Bonn, Germany, ICL-UT-06-12, 2006 [pdf] [bibtex] @{article}{icl:{316},
author = {Angskun, T., Fagg, G., Bosilca, G., Pjesivac-Grbovic, J., Dongarra, J.},
title = {Scalable Fault Tolerant Protocol for Parallel Runtime Environments},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {2006 Euro PVM/MPI},
} [
hide]
Fagg, G., Angskun, T., Bosilca, G., Pjesivac-Grbovic, J., Dongarra, J. "Scalable Fault Tolerant MPI: Extending the Recovery Algorithm," Proceedings of 12th European Parallel Virtual Machine and Message Passing Interface Conference - Euro PVM/MPI, Di Martino, B. et al. eds. Springer-Verlag Berlin, Sorrento (Naples) , Italy, LCNS 3666, pp. 67, September 18-21, 2005 [pdf] [bibtex] @{inproceedings}{icl:{279},
author = {Fagg, G., Angskun, T., Bosilca, G., Pjesivac-Grbovic, J., Dongarra, J.},
title = {Scalable Fault Tolerant MPI: Extending the Recovery Algorithm},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {Proceedings of 12th European Parallel Virtual Machine and Message Passing Interface Conference - Euro PVM/MPI},
volume = {LCNS 3666},
pages = {pp. 67},
} [
hide]
Bosilca, G., Dongarra, J., Fagg, G., Langou, J. "Hash Functions for Datatype Signatures in MPI," Proceedings of 12th European Parallel Virtual Machine and Message Passing Interface Conference - Euro PVM/MPI, Di Martino, B. et al. eds. Springer-Verlag Berlin, Sorrento (Naples), Italy, LCNS 3666, pp. 76-83, September 18-21, 2005 [pdf] [bibtex] @{inproceedings}{icl:{280},
author = {Bosilca, G., Dongarra, J., Fagg, G., Langou, J.},
title = {Hash Functions for Datatype Signatures in MPI},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {Proceedings of 12th European Parallel Virtual Machine and Message Passing Interface Conference - Euro PVM/MPI},
volume = {LCNS 3666},
pages = {pp. 76-83},
} [
hide]
Pjesivac-Grbovic, J., Angskun, T., Bosilca, G., Fagg, G., Gabriel, E., Dongarra, J. "Performance Analysis of MPI Collective Operations," 4th International Workshop on Performance Modeling, Evaluation, and Optmization of Parallel and Distributed Systems (PMEO-PDS '05), Denver, Colorado, April 4-8, 2005 [pdf] [bibtex] @{inproceedings}{icl:{249},
author = {Pjesivac-Grbovic, J., Angskun, T., Bosilca, G., Fagg, G., Gabriel, E., Dongarra, J.},
title = {Performance Analysis of MPI Collective Operations},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {4th International Workshop on Performance Modeling, Evaluation, and Optmization of Parallel and Distributed Systems (PMEO-PDS '05)},
} [
hide]
Chen, Z., Fagg, G., Gabriel, E., Langou, J., Angskun, T., Bosilca, G., Dongarra, J. "Fault Tolerant High Performance Computing by a Coding Approach," Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (to appear), Chicago, Illinois, June 15-17, 2005 [pdf] [bibtex] @{inproceedings}{icl:{265},
author = {Chen, Z., Fagg, G., Gabriel, E., Langou, J., Angskun, T., Bosilca, G., Dongarra, J.},
title = {Fault Tolerant High Performance Computing by a Coding Approach},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (to appear)},
} [
hide]
Pjesivac-Grbovic, J., Angskun, Bosilca, G., Fagg, G., Gabriel, E., Dongarra, J. "Performance Analysis of MPI Collective Operations," Cluster Computing Journal (to appear), 2006 [pdf] [bibtex] @{article}{icl:{306},
author = {Pjesivac-Grbovic, J., Angskun, Bosilca, G., Fagg, G., Gabriel, E., Dongarra, J.},
title = {Performance Analysis of MPI Collective Operations},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {Cluster Computing Journal (to appear)},
} [
hide]
Bosilca, G., Chen, Z., Dongarra, J., Langou, J. "Recovery Patterns for Iterative Methods in a Parallel Unstable Environment," University of Tennessee Computer Science Department Technical Report, UT-CS-04-538, 2005 [pdf] [bibtex] @{techreport}{icl:{301},
author = {Bosilca, G., Chen, Z., Dongarra, J., Langou, J. },
title = {Recovery Patterns for Iterative Methods in a Parallel Unstable Environment},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {University of Tennessee Computer Science Department Technical Report, UT-CS-04-538},
} [
hide]
Fagg, G., Gabriel, E., Bosilca, G., Angskun, T., Chen, Z., Pjesivac-Grbovic, J., London, K., Dongarra, J. "Extending the MPI Specification for Process Fault Tolerance on High Performance Computing Systems," Proceedings of ISC2004 (to appear), Heidelberg, Germany, June 23, 2004 [pdf] [bibtex] @{inproceedings}{icl:{230},
author = {Fagg, G., Gabriel, E., Bosilca, G., Angskun, T., Chen, Z., Pjesivac-Grbovic, J., London, K., Dongarra, J.},
title = {Extending the MPI Specification for Process Fault Tolerance on High Performance Computing Systems},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {Proceedings of ISC2004 (to appear)},
} [
hide]
Fagg, G., Gabriel, E., Chen, Z., Angskun, T., Bosilca, G., Pjesivac-Grbovic, J., Dongarra, J. "Process Fault-Tolerance: Semantics, Design and Applications for High Performance Computing," International Journal for High Performance Applications and Supercomputing (to appear), April, 2004 [pdf] [bibtex] @{article}{icl:{240},
author = {Fagg, G., Gabriel, E., Chen, Z., Angskun, T., Bosilca, G., Pjesivac-Grbovic, J., Dongarra, J.},
title = {Process Fault-Tolerance: Semantics, Design and Applications for High Performance Computing},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {International Journal for High Performance Applications and Supercomputing (to appear)},
} [
hide]
Bosilca, G., Chen, Z., Dongarra, J., Langou, J. "Recovery Patterns for Iterative Methods in a Parallel Unstable Environment," ICL Technical Report, ICL-UT-04-04, 2004 [pdf] [bibtex] @{techreport}{icl:{251},
author = {Bosilca, G., Chen, Z., Dongarra, J., Langou, J.},
title = {Recovery Patterns for Iterative Methods in a Parallel Unstable Environment},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {ICL Technical Report},
} [
hide]
Fagg, G., Gabriel, E., Chen, Z., Angskun, T., Bosilca, G., Bukovsky, A., Dongarra, J. "Fault Tolerant Communication Library and Applications for High Performance Computing," Los Alamos Computer Science Institute (LACSI) Symposium 2003 (presented), Santa Fe, NM, October 27-29, 2003 [pdf] [bibtex] @{inproceedings}{icl:{153},
author = {Fagg, G., Gabriel, E., Chen, Z., Angskun, T., Bosilca, G., Bukovsky, A., Dongarra, J.},
title = {Fault Tolerant Communication Library and Applications for High Performance Computing},
institution = {Innovative Computing Laboratory, University of Tennessee},
journal = {Los Alamos Computer Science Institute (LACSI) Symposium 2003 (presented)},
} [
hide]