Abstract
This paper studies some asymptotic properties of adaptive algorithms widely used in optimization and machine learning, and among them Adagrad and Rmsprop, which are involved in most of the blackbox deep learning algorithms. Our setup is the non-convex landscape optimization point of view, we consider a one time scale parametrization and the situation where these algorithms may or may not be used with mini-batches. We adopt the point of view of stochastic algorithms and establish the almost sure convergence of these methods when using a decreasing step-size towards the set of critical points of the target function. With a mild extra assumption on the noise, we also obtain the convergence towards the set of minimizers of the function. Along our study, we also obtain a “convergence rate” of the methods, in the vein of the works of Ghadimi and Lan (2013).
| Original language | English |
|---|---|
| Article number | 228 |
| Journal | Journal of Machine Learning Research |
| Volume | 23 |
| Publication status | Published - 1 Aug 2022 |
| Externally published | Yes |
Keywords
- Convergence of random variables
- Stochastic adaptive algorithm
- Stochastic optimization
Fingerprint
Dive into the research topics of 'Asymptotic Study of Stochastic Adaptive Algorithms in Non-convex Landscape'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver