The failure detector abstraction

Research output: Contribution to journalArticlepeer-review

Abstract

A failure detector is a fundamental abstraction in distributed computing. This article surveys this abstraction through two dimensions. First we study failure detectors as building blocks to simplify the design of reliable distributed algorithms. In particular, we illustrate how failure detectors can factor out timing assumptions to detect failures in distributed agreement algorithms. Second, we study failure detectors as computability benchmarks. That is, we survey the weakest failure detector question and illustrate how failure detectors can be used to classify problems. We also highlight some limitations of the failure detector abstraction along each of the dimensions.

Original languageEnglish
Article number9
JournalACM Computing Surveys
Volume43
Issue number2
DOIs
Publication statusPublished - 1 Jan 2011
Externally publishedYes

Keywords

  • Agreement problem
  • Atomic commit
  • Consensus
  • Distributed system
  • Fault tolerance
  • Liveness
  • Message passing
  • Safety
  • Synchrony

Fingerprint

Dive into the research topics of 'The failure detector abstraction'. Together they form a unique fingerprint.

Cite this