Abstract
Byzantine-resilient Stochastic Gradient Descent (SGD) aims at shielding model training from Byzantine faults, be they ill-labeled training datapoints, exploited software/hardware vulnerabilities, or malicious worker nodes in a distributed setting. Two recent attacks have been challenging state-of-the-art defenses though, often successfully precluding the model from even fitting the training set. The main identified weakness in current defenses is their requirement of a sufficiently low variance-norm ratio for the stochastic gradients. We propose a practical method which, despite increasing the variance, reduces the variance-norm ratio, mitigating the identified weakness. We assess the effectiveness of our method over 736 different training configurations, comprising the 2 state-of-the-art attacks and 6 defenses. For confidence and reproducibility purposes, each configuration is run 5 times with specified seeds (1 to 5), totalling 3680 runs. In our experiments, when the attack is effective enough to decrease the highest observed top-1 cross-accuracy by at least 20% compared to the unattacked run, our technique systematically increases back the highest observed accuracy, and is able to recover at least 20% in more than 60% of the cases.
| Original language | English |
|---|---|
| Publication status | Published - 1 Jan 2021 |
| Externally published | Yes |
| Event | 9th International Conference on Learning Representations, ICLR 2021 - Virtual, Online, Austria Duration: 3 May 2021 → 7 May 2021 |
Conference
| Conference | 9th International Conference on Learning Representations, ICLR 2021 |
|---|---|
| Country/Territory | Austria |
| City | Virtual, Online |
| Period | 3/05/21 → 7/05/21 |
Fingerprint
Dive into the research topics of 'Distributed Momentum for Byzantine-resilient Stochastic Gradient Descent'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver