Passer à la navigation principale Passer à la recherche Passer au contenu principal

Robustness analysis of non-convex stochastic gradient descent using biased expectations

  • Huawei Noah's Ark Lab

Résultats de recherche: Contribution à un journalArticle de conférenceRevue par des pairs

Résumé

This work proposes a novel analysis of stochastic gradient descent (SGD) for non-convex and smooth optimization. Our analysis sheds light on the impact of the probability distribution of the gradient noise on the convergence rate of the norm of the gradient. In the case of sub-Gaussian and centered noise, we prove that, with probability 1 - d, the number of iterations to reach a precision e for the squared gradient norm is O(e-2 ln(1/d)). In the case of centered and integrable heavy-tailed noise, we show that, while the expectation of the iterates may be infinite, the squared gradient norm still converges with probability 1 - d in O(e-pd-q) iterations, where p, q > 2. This result shows that heavy-tailed noise on the gradient slows down the convergence of SGD without preventing it, proving that SGD is robust to gradient noise with unbounded variance, a setting of interest for Deep Learning. In addition, it indicates that choosing a step size proportional to T-1/b where b is the tail-parameter of the noise and T is the number of iterations leads to the best convergence rates. Both results are simple corollaries of a unified analysis using the novel concept of biased expectations, a simple and intuitive mathematical tool to obtain concentration inequalities. Using this concept, we propose a new quantity to measure the amount of noise added to the gradient, and discuss its value in multiple scenarios.

langue originaleAnglais
journalAdvances in Neural Information Processing Systems
Volume2020-December
étatPublié - 1 janv. 2020
Modification externeOui
Evénement34th Conference on Neural Information Processing Systems, NeurIPS 2020 - Virtual, Online
Durée: 6 déc. 202012 déc. 2020

Empreinte digitale

Examiner les sujets de recherche de « Robustness analysis of non-convex stochastic gradient descent using biased expectations ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation