AKSEL: Fast Byzantine SGD

  • Amine Boussetta
  • , El Mahdi El-Mhamdi
  • , Rachid Guerraoui
  • , Alexandre Maurer
  • , Sébastien Rouault

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Modern machine learning architectures distinguish servers and workers. Typically, a d-dimensional model is hosted by a server and trained by n workers, using a distributed stochastic gradient descent (SGD) optimization scheme. At each SGD step, the goal is to estimate the gradient of a cost function. The simplest way to do this is to average the gradients estimated by the workers. However, averaging is not resilient to even one single Byzantine failure of a worker. Many alternative gradient aggregation rules (GARs) have recently been proposed to tolerate a maximum number f of Byzantine workers. These GARs differ according to (1) the complexity of their computation time, (2) the maximal number of Byzantine workers despite which convergence can still be ensured (breakdown point), and (3) their accuracy, which can be captured by (3.1) their angular error, namely the angle with the true gradient, as well as (3.2) their ability to aggregate full gradients. In particular, many are not full gradients for they operate on each dimension separately, which results in a coordinate-wise blended gradient, leading to low accuracy in practical situations where the number (s) of workers that are actually Byzantine in an execution is small (s << f).

Original languageEnglish
Title of host publication24th International Conference on Principles of Distributed Systems, OPODIS 2020
EditorsQuentin Bramas, Rotem Oshman, Paolo Romano
PublisherSchloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
ISBN (Electronic)9783959771764
DOIs
Publication statusPublished - 1 Jan 2021
Externally publishedYes
Event24th International Conference on Principles of Distributed Systems, OPODIS 2020 - Virtual, Online, France
Duration: 14 Dec 202016 Dec 2020

Publication series

NameLeibniz International Proceedings in Informatics, LIPIcs
Volume184
ISSN (Print)1868-8969

Conference

Conference24th International Conference on Principles of Distributed Systems, OPODIS 2020
Country/TerritoryFrance
CityVirtual, Online
Period14/12/2016/12/20

Keywords

  • Byzantine failures
  • Machine learning
  • Stochastic gradient descent

Fingerprint

Dive into the research topics of 'AKSEL: Fast Byzantine SGD'. Together they form a unique fingerprint.

Cite this