Skip to main navigation Skip to search Skip to main content

Quantitative propagation of Chaos for SGD in wide neural networks

  • Valentin de Bortoli
  • , Alain Durmus
  • , Xavier Fontaine
  • , Umut Şimşekli
  • University of Oxford
  • Université Paris-Saclay
  • Institut Polytechnique de Paris

Research output: Contribution to journalConference articlepeer-review

Abstract

In this paper, we investigate the limiting behavior of a continuous-time counterpart of the Stochastic Gradient Descent (SGD) algorithm applied to two-layer overparameterized neural networks, as the number or neurons (i.e., the size of the hidden layer) N → +∞. Following a probabilistic approach, we show ‘propagation of chaos’ for the particle system defined by this continuous-time dynamics under different scenarios, indicating that the statistical interaction between the particles asymptotically vanishes. In particular, we establish quantitative convergence with respect to N of any particle to a solution of a mean-field McKean-Vlasov equation in the metric space endowed with the Wasserstein distance. In comparison to previous works on the subject, we consider settings in which the sequence of stepsizes in SGD can potentially depend on the number of neurons and the iterations. We then identify two regimes under which different mean-field limits are obtained, one of them corresponding to an implicitly regularized version of the minimization problem at hand. We perform various experiments on real datasets to validate our theoretical results, assessing the existence of these two regimes on classification problems and illustrating our convergence results.

Original languageEnglish
JournalAdvances in Neural Information Processing Systems
Volume2020-December
Publication statusPublished - 1 Jan 2020
Externally publishedYes
Event34th Conference on Neural Information Processing Systems, NeurIPS 2020 - Virtual, Online
Duration: 6 Dec 202012 Dec 2020

Fingerprint

Dive into the research topics of 'Quantitative propagation of Chaos for SGD in wide neural networks'. Together they form a unique fingerprint.

Cite this