Passer à la navigation principale Passer à la recherche Passer au contenu principal

FiDe: Reliable and Fast Crash Failure Detection to Boost Datacenter Coordination

  • Davide Rovelli
  • , Pavel Chuprikov
  • , Philipp Berdesinski
  • , Ali Pahlevan
  • , Patrick Jahnke
  • , Patrick Eugster

Résultats de recherche: Le chapitre dans un livre, un rapport, une anthologie ou une collectionContribution à une conférenceRevue par des pairs

Résumé

>Failure detection is one of the most fundamental primitives on which distributed fault tolerant services and applications rely to achieve liveness. Typical crash failure detectors resort to using timeouts that have to take into account the unpredictability in interaction times among remote processes, caused by resource contention in the network and in endhost processors. While modern (gray) failure detectors have improved in detecting a wide range of failures, the problem of prohibitively large and unreliable timeouts for crash failures still persists, hampering performance of both the failure detector themselves and modern µs-scale services sitting on top. We propose a novel fully reliable failure-detector (FiDe) that can report the crash of a remote process in a datacenter within less than 30 µs (7.2× faster than the current state of the art) with extremely high reliability, thanks to a ground-up design which provides stable end-to-end process interactions. By reliably lowering worst-case crash detection time, FiDe enables a class of algorithms that can be used to boost coordination services even in the absence of failures. We devise two novel, FiDe-based, highly efficient consensus protocols and integrate them into a key-value store and a synchronization service, improving throughput by up to 2.23× and reducing latency down to 0.46×.

langue originaleAnglais
titreProceedings of the 2025 USENIX Annual Technical Conference, ATC 2025
EditeurUSENIX Association
Pages765-788
Nombre de pages24
ISBN (Electronique)9781939133489
étatPublié - 1 janv. 2025
Evénement2025 USENIX Annual Technical Conference, ATC 2025 - Boston, États-Unis
Durée: 7 juil. 20259 juil. 2025

Série de publications

NomProceedings of the 2025 USENIX Annual Technical Conference, ATC 2025

Une conférence

Une conférence2025 USENIX Annual Technical Conference, ATC 2025
Pays/TerritoireÉtats-Unis
La villeBoston
période7/07/259/07/25

Empreinte digitale

Examiner les sujets de recherche de « FiDe: Reliable and Fast Crash Failure Detection to Boost Datacenter Coordination ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation