TY - GEN
T1 - The hidden vulnerability of distributed learning in byzantium
AU - El Mhamdi, El Mahdi
AU - Guerraoui, Rachid
AU - Rouault, Sebastien
N1 - Publisher Copyright:
© The Author(s) 2018.
PY - 2018/1/1
Y1 - 2018/1/1
N2 - While machine learning is going through an era of celebrated success, concerns have been raised about the vulnerability of its backbone: Stochastic gradient descent (SGD). Recent approaches have been proposed to ensure the robustness, of distributed SGD against adversarial (Byzantine) workers sending poisoned gradients during the training phase. Some of these approaches have been proven Byzantine-resilient: They ensure the convergence of SGD despite the presence of a minority of adversarial workers. We show in this paper that convergence is not enough. In high dimension d 1, an adversary can build on the loss function's non-convexity to make SGD converge to ineffective models. More precisely, we bring to light that existing Byzantine-resilient schemes leave a margin of poisoning of Cl(f(d)), where f(d) increases at least like Vd. Based on this leeway, we build a simple attack, and experimentally show its strong to utmost effectivity on CIFAR-10 and MNIST. We introduce Bulyan, and prove it significantly reduces the attacker's leeway to a narrow C(1/V5") bound. We empirically show that Bulyan does not suffer the fragility of existing aggregation rules and, at a reasonable cost in terms of required batch size, achieves convergence as if only non-Byzantine gradients had been used to update the model.
AB - While machine learning is going through an era of celebrated success, concerns have been raised about the vulnerability of its backbone: Stochastic gradient descent (SGD). Recent approaches have been proposed to ensure the robustness, of distributed SGD against adversarial (Byzantine) workers sending poisoned gradients during the training phase. Some of these approaches have been proven Byzantine-resilient: They ensure the convergence of SGD despite the presence of a minority of adversarial workers. We show in this paper that convergence is not enough. In high dimension d 1, an adversary can build on the loss function's non-convexity to make SGD converge to ineffective models. More precisely, we bring to light that existing Byzantine-resilient schemes leave a margin of poisoning of Cl(f(d)), where f(d) increases at least like Vd. Based on this leeway, we build a simple attack, and experimentally show its strong to utmost effectivity on CIFAR-10 and MNIST. We introduce Bulyan, and prove it significantly reduces the attacker's leeway to a narrow C(1/V5") bound. We empirically show that Bulyan does not suffer the fragility of existing aggregation rules and, at a reasonable cost in terms of required batch size, achieves convergence as if only non-Byzantine gradients had been used to update the model.
M3 - Conference contribution
AN - SCOPUS:85057247992
T3 - 35th International Conference on Machine Learning, ICML 2018
SP - 5674
EP - 5686
BT - 35th International Conference on Machine Learning, ICML 2018
A2 - Dy, Jennifer
A2 - Krause, Andreas
PB - International Machine Learning Society (IMLS)
T2 - 35th International Conference on Machine Learning, ICML 2018
Y2 - 10 July 2018 through 15 July 2018
ER -