Cantitate/Preț
Produs

Fault-Tolerance Techniques for High-Performance Computing: Computer Communications and Networks

Editat de Thomas Herault, Yves Robert
en Limba Engleză Hardback – 15 iul 2015
This timely text presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as ABFT. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Features: provides a survey of resilience methods and performance models; examines the various sources for errors and faults in large-scale systems; reviews the spectrum of techniques that can be applied to design a fault-tolerant MPI; investigates different approaches to replication; discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems.
Citește tot Restrânge

Din seria Computer Communications and Networks

Preț: 63160 lei

Preț vechi: 78950 lei
-20%

Puncte Express: 947

Carte tipărită la comandă

Livrare economică 07-21 iulie

Livrare prin curier în România Termenul estimat este afișat lângă disponibilitate.
Transport gratuit pentru acest produs Plată online sau ramburs, în funcție de opțiunile comenzii.
Retur gratuit în 14 zile Comandă securizată și suport în română.

Specificații

ISBN-13: 9783319209425
ISBN-10: 3319209426
Pagini: 332
Ilustrații: IX, 320 p. 113 illus.
Dimensiuni: 160 x 241 x 24 mm
Greutate: 0.66 kg
Ediția:1st edition 2015
Editura: Springer
Colecția Computer Communications and Networks
Seria Computer Communications and Networks

Locul publicării:Cham, Switzerland

Public țintă

Research

Cuprins

Part I: General Overview.- Fault-Tolerance Techniques for High-Performance Computing.- Part II: Technical Contributions.- Errors and Faults.- Fault-Tolerant MPI.- Using Replication for Resilience on Exascale Systems.- Energy-Aware Check pointing Strategies.

Textul de pe ultima copertă

This timely text/reference presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC).
The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as algorithm-based fault tolerance. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models.
Topics and features:
  • Includes self-contained contributions from an international selection of preeminent experts
  • Provides a survey of resilience methods and performance models
  • Examines the various sources for errors and faults in large-scale systems, detailing their characteristics, with a focus on modeling, detection and prediction
  • Reviews the spectrum of techniques that can be applied to design a fault-tolerant message passing interface
  • Investigates different approaches to replication, comparing these to the traditional checkpoint-recovery approach
  • Discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems, proposing a methodology to estimate such energy consumption
This authoritative volume is essential reading for all researchers and graduate students involved in high-performance computing.
Dr. Thomas Herault is a Research Scientist in the Innovative Computing Laboratory (ICL) at the University of Tennessee Knoxville, TN, USA. Dr. Yves Robert is a Professor in the Laboratory of Parallel Computing at the Ecole Normale Supérieure de Lyon, France, and a Visiting Research Scholar in the ICL.

Caracteristici

The first complete overview of this increasingly important field Presents a unique, rigorous approach based on the design of analytical models to predict performance Provides a coherent collection of valuable insights from internationally-renowned experts with considerable expertise Includes supplementary material: sn.pub/extras