Abstract
This paper is devoted to the non-asymptotic analysis of the Ruppert–Polyak averaging method introduced in Polyak and Juditsky (1992) and Ruppert (1988)[26] for the minimization of a smooth function f with a stochastic algorithm. We first establish a general non-asymptotic optimal bound: if θˆn is the position of the algorithm at step n, we prove that [Formula presented] where Σ⋆ is the limiting covariance matrix of the CLT demonstrated in Polyak and Juditsky (1992) and Cd,fn−rβ is a new state-of-the-art second order term that translates the effect of the dimension. We also identify the optimal gain of the baseline SGD γn=γn−3/4, leading to a second-order term with r3/4=5/4. Second, we show that this result holds under some Kurdyka-Łojiasewicz-type condition (Kurdyka, 1988; Lojasiewicz, 1963) for function f, which is far more general than the standard uniformly strongly convex case. In particular, it makes it possible to handle some pathological examples such as on-line learning for logistic regression and recursive quantile estimation.
| Original language | English |
|---|---|
| Pages (from-to) | 312-348 |
| Number of pages | 37 |
| Journal | Stochastic Processes and their Applications |
| Volume | 156 |
| DOIs | |
| Publication status | Published - 1 Feb 2023 |
| Externally published | Yes |
Keywords
- Averaging
- Optimization
- Stochastic Gradient Descent