TY - GEN
T1 - State-machine replication for planet-scale systems
AU - Enes, Vitor
AU - Baquero, Carlos
AU - Rezende, Tuanir França
AU - Gotsman, Alexey
AU - Perrin, Matthieu
AU - Sutra, Pierre
N1 - Publisher Copyright:
© 2020 ACM.
PY - 2020/4/17
Y1 - 2020/4/17
N2 - Online applications now routinely replicate their data at multiple sites around the world. In this paper we present Atlas, the first state-machine replication protocol tailored for such planet-scale systems. Atlas does not rely on a distinguished leader, so clients enjoy the same quality of service independently of their geographical locations. Furthermore, client-perceived latency improves as we add sites closer to clients. To achieve this, Atlas minimizes the size of its quorums using an observation that concurrent data center failures are rare. It also processes a high percentage of accesses in a single round trip, even when these conflict. We experimentally demonstrate that Atlas consistently outperforms state-of-The-Art protocols in planet-scale scenarios. In particular, Atlas is up to two times faster than Flexible Paxos with identical failure assumptions, and more than doubles the performance of Egalitarian Paxos in the YCSB benchmark.
AB - Online applications now routinely replicate their data at multiple sites around the world. In this paper we present Atlas, the first state-machine replication protocol tailored for such planet-scale systems. Atlas does not rely on a distinguished leader, so clients enjoy the same quality of service independently of their geographical locations. Furthermore, client-perceived latency improves as we add sites closer to clients. To achieve this, Atlas minimizes the size of its quorums using an observation that concurrent data center failures are rare. It also processes a high percentage of accesses in a single round trip, even when these conflict. We experimentally demonstrate that Atlas consistently outperforms state-of-The-Art protocols in planet-scale scenarios. In particular, Atlas is up to two times faster than Flexible Paxos with identical failure assumptions, and more than doubles the performance of Egalitarian Paxos in the YCSB benchmark.
KW - consensus
KW - fault tolerance
KW - geo-replication
U2 - 10.1145/3342195.3387543
DO - 10.1145/3342195.3387543
M3 - Conference contribution
AN - SCOPUS:85087105741
T3 - Proceedings of the 15th European Conference on Computer Systems, EuroSys 2020
BT - Proceedings of the 15th European Conference on Computer Systems, EuroSys 2020
PB - Association for Computing Machinery
T2 - 15th European Conference on Computer Systems, EuroSys 2020
Y2 - 27 April 2020 through 30 April 2020
ER -