Skip to main navigation Skip to search Skip to main content

Convergence Rates of Non-Convex Stochastic Gradient Descent Under a Generic Łojasiewicz Condition and Local Smoothness

  • Université PSL
  • Huawei Noah's Ark

Research output: Contribution to journalConference articlepeer-review

Abstract

Training over-parameterized neural networks involves the empirical minimization of highly nonconvex objective functions. Recently, a large body of works provided theoretical evidence that, despite this non-convexity, properly initialized over-parameterized networks can converge to a zero training loss through the introduction of the Polyak-Łojasiewicz condition. However, these analyses are restricted to quadratic losses such as mean square error, and tend to indicate fast exponential convergence rates that are seldom observed in practice. In this work, we propose to extend these results by analyzing stochastic gradient descent under more generic Łojasiewicz conditions that are applicable to any convex loss function, thus extending the current theory to a larger panel of losses commonly used in practice such as cross-entropy. Moreover, our analysis provides high-probability bounds on the approximation error under sub-Gaussian gradient noise and only requires the local smoothness of the objective function, thus making it applicable to deep neural networks in realistic settings.

Original languageEnglish
Pages (from-to)19310-19327
Number of pages18
JournalProceedings of Machine Learning Research
Volume162
Publication statusPublished - 1 Jan 2022
Externally publishedYes
Event39th International Conference on Machine Learning, ICML 2022 - Baltimore, United States
Duration: 17 Jul 202223 Jul 2022

Fingerprint

Dive into the research topics of 'Convergence Rates of Non-Convex Stochastic Gradient Descent Under a Generic Łojasiewicz Condition and Local Smoothness'. Together they form a unique fingerprint.

Cite this