TY - GEN
T1 - P4ce
T2 - 44th IEEE International Conference on Distributed Computing Systems, ICDCS 2024
AU - Dulong, Rémi
AU - Felber, Nathan
AU - Felber, Pascal
AU - Hopin, Gilles
AU - Lepers, Baptiste
AU - Schiavoni, Valerio
AU - Thomas, Gaël
AU - Vaucher, Sébastien
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024/1/1
Y1 - 2024/1/1
N2 - P4ce is the first replication protocol that exhibits the same latency and requires the same network capacity as sending data to a single server. P4ce builds upon previous RDMA-based consensus protocols. They achieve consensus with a single network round-trip, but with a reduced network throughput. P4ce also achieves consensus with a single round-trip, but without degrading throughput by decoupling the consensus decisions from the RDMA communications. The decision part of the consensus protocol runs on a commodity server, but the communication part of P4ce is fully implemented on a programmable switch, which replicates data and aggregates the acknowledgements in the network, avoiding the throughput bottleneck at the leader. Although simple in its principle, the implementation of P4ce raises many challenging issues, notably caused by the complexity of RDMA and the underlying network protocols, the intricacies of packet rewriting during replication and aggregation, and the restricted set of operations that can be implemented at wire speed in the programmable switch. We implemented P4ce and deployed it on a commercially-available Intel Tofino switch, achieving up to 4x better through-put and better latency than state-of-the-art consensus protocols.
AB - P4ce is the first replication protocol that exhibits the same latency and requires the same network capacity as sending data to a single server. P4ce builds upon previous RDMA-based consensus protocols. They achieve consensus with a single network round-trip, but with a reduced network throughput. P4ce also achieves consensus with a single round-trip, but without degrading throughput by decoupling the consensus decisions from the RDMA communications. The decision part of the consensus protocol runs on a commodity server, but the communication part of P4ce is fully implemented on a programmable switch, which replicates data and aggregates the acknowledgements in the network, avoiding the throughput bottleneck at the leader. Although simple in its principle, the implementation of P4ce raises many challenging issues, notably caused by the complexity of RDMA and the underlying network protocols, the intricacies of packet rewriting during replication and aggregation, and the restricted set of operations that can be implemented at wire speed in the programmable switch. We implemented P4ce and deployed it on a commercially-available Intel Tofino switch, achieving up to 4x better through-put and better latency than state-of-the-art consensus protocols.
KW - consensus
KW - programmable switch
KW - rdma
KW - smr
KW - tofino
UR - https://www.scopus.com/pages/publications/85203178097
U2 - 10.1109/ICDCS60910.2024.00054
DO - 10.1109/ICDCS60910.2024.00054
M3 - Conference contribution
AN - SCOPUS:85203178097
T3 - Proceedings - International Conference on Distributed Computing Systems
SP - 508
EP - 519
BT - Proceedings - 2024 IEEE 44th International Conference on Distributed Computing Systems, ICDCS 2024
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 23 July 2024 through 26 July 2024
ER -