The fault detection problem

Andreas Haeberlen, Petr Kuznetsov

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

One of the most important challenges in distributed computing is ensuring that services are correct and available despite faults. Recently it has been argued that fault detection can be factored out from computation, and that a generic fault detection service can be a useful abstraction for building distributed systems. However, while fault detection has been extensively studied for crash faults, little is known about detecting more general kinds of faults. This paper explores the power and the inherent costs of generic fault detection in a distributed system. We propose a formal framework that allows us to partition the set of all faults that can possibly occur in a distributed computation into several fault classes. Then we formulate the fault detection problem for a given fault class, and we show that this problem can be solved for only two specific fault classes, namely omission faults and commission faults. Finally, we derive tight lower bounds on the cost of solving the problem for these two classes in asynchronous message-passing systems.

Original languageEnglish
Title of host publicationPrinciples of Distributed Systems - 13th International Conference, OPODIS 2009, Proceedings
Pages99-114
Number of pages16
DOIs
Publication statusPublished - 31 Dec 2009
Externally publishedYes
Event13th International Conference on Principles of Distributed Systems, OPODIS 2009 - Nimes, France
Duration: 15 Dec 200918 Dec 2009

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5923 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference13th International Conference on Principles of Distributed Systems, OPODIS 2009
Country/TerritoryFrance
CityNimes
Period15/12/0918/12/09

Keywords

  • Fault classes
  • Fault detection problem
  • Lower bounds
  • Message complexity

Fingerprint

Dive into the research topics of 'The fault detection problem'. Together they form a unique fingerprint.

Cite this