TY - GEN
T1 - Fast and Robust Distributed Learning in High Dimension
AU - El-Mhamdi, El Mahdi
AU - Guerraoui, Rachid
AU - Rouault, Sebastien
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/9/1
Y1 - 2020/9/1
N2 - Could a gradient aggregation rule (GAR) for distributed machine learning be both robust and fast? This paper answers by the affirmative through MULTI-BULYAN. Given n workers, f of which are arbitrary malicious (Byzantine) and m=n-f are not, we prove that MULTI-BULYAN can ensure a strong form of Byzantine resilience, as well as an \fracmn slowdown, compared to averaging, the fastest (but non Byzantine resilient) rule for distributed machine learning. When m\approx n (almost all workers are correct), MULTI-BULYAN reaches the speed of averaging. We also prove that MULTI-BULYAN's cost in local computation is O(d) (like averaging), an important feature for ML where d commonly reaches 109, while robust alternatives have at least quadratic cost in d. Our theoretical findings are complemented with an experimental evaluation which, in addition to supporting the linear O(d) complexity argument, conveys the fact that MULTI-BULYAN's parallelisability further adds to its efficiency.
AB - Could a gradient aggregation rule (GAR) for distributed machine learning be both robust and fast? This paper answers by the affirmative through MULTI-BULYAN. Given n workers, f of which are arbitrary malicious (Byzantine) and m=n-f are not, we prove that MULTI-BULYAN can ensure a strong form of Byzantine resilience, as well as an \fracmn slowdown, compared to averaging, the fastest (but non Byzantine resilient) rule for distributed machine learning. When m\approx n (almost all workers are correct), MULTI-BULYAN reaches the speed of averaging. We also prove that MULTI-BULYAN's cost in local computation is O(d) (like averaging), an important feature for ML where d commonly reaches 109, while robust alternatives have at least quadratic cost in d. Our theoretical findings are complemented with an experimental evaluation which, in addition to supporting the linear O(d) complexity argument, conveys the fact that MULTI-BULYAN's parallelisability further adds to its efficiency.
KW - Byzantine Resilience
KW - Distributed Systems
KW - High Dimension
KW - Machine Learning
KW - Non-Convex Optimization
KW - Stochastic Gradient Descent
U2 - 10.1109/SRDS51746.2020.00015
DO - 10.1109/SRDS51746.2020.00015
M3 - Conference contribution
AN - SCOPUS:85097793453
T3 - Proceedings of the IEEE Symposium on Reliable Distributed Systems
SP - 71
EP - 80
BT - Proceedings - 2020 International Symposium on Reliable Distributed Systems, SRDS 2020
PB - IEEE Computer Society
T2 - 39th International Symposium on Reliable Distributed Systems, SRDS 2020
Y2 - 21 September 2020 through 24 September 2020
ER -