Passer à la navigation principale Passer à la recherche Passer au contenu principal

Convergence Rates of Non-Convex Stochastic Gradient Descent Under a Generic Łojasiewicz Condition and Local Smoothness

  • Université PSL
  • Huawei Noah's Ark

Résultats de recherche: Contribution à un journalArticle de conférenceRevue par des pairs

Résumé

Training over-parameterized neural networks involves the empirical minimization of highly nonconvex objective functions. Recently, a large body of works provided theoretical evidence that, despite this non-convexity, properly initialized over-parameterized networks can converge to a zero training loss through the introduction of the Polyak-Łojasiewicz condition. However, these analyses are restricted to quadratic losses such as mean square error, and tend to indicate fast exponential convergence rates that are seldom observed in practice. In this work, we propose to extend these results by analyzing stochastic gradient descent under more generic Łojasiewicz conditions that are applicable to any convex loss function, thus extending the current theory to a larger panel of losses commonly used in practice such as cross-entropy. Moreover, our analysis provides high-probability bounds on the approximation error under sub-Gaussian gradient noise and only requires the local smoothness of the objective function, thus making it applicable to deep neural networks in realistic settings.

langue originaleAnglais
Pages (de - à)19310-19327
Nombre de pages18
journalProceedings of Machine Learning Research
Volume162
étatPublié - 1 janv. 2022
Modification externeOui
Evénement39th International Conference on Machine Learning, ICML 2022 - Baltimore, États-Unis
Durée: 17 juil. 202223 juil. 2022

Empreinte digitale

Examiner les sujets de recherche de « Convergence Rates of Non-Convex Stochastic Gradient Descent Under a Generic Łojasiewicz Condition and Local Smoothness ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation