Optimal non-asymptotic analysis of the Ruppert–Polyak averaging stochastic algorithm

Sébastien Gadat, Fabien Panloup

Research output: Contribution to journalArticlepeer-review

Abstract

This paper is devoted to the non-asymptotic analysis of the Ruppert–Polyak averaging method introduced in Polyak and Juditsky (1992) and Ruppert (1988)[26] for the minimization of a smooth function f with a stochastic algorithm. We first establish a general non-asymptotic optimal bound: if θˆn is the position of the algorithm at step n, we prove that [Formula presented] where Σ is the limiting covariance matrix of the CLT demonstrated in Polyak and Juditsky (1992) and Cd,fn−rβ is a new state-of-the-art second order term that translates the effect of the dimension. We also identify the optimal gain of the baseline SGD γn=γn−3/4, leading to a second-order term with r3/4=5/4. Second, we show that this result holds under some Kurdyka-Łojiasewicz-type condition (Kurdyka, 1988; Lojasiewicz, 1963) for function f, which is far more general than the standard uniformly strongly convex case. In particular, it makes it possible to handle some pathological examples such as on-line learning for logistic regression and recursive quantile estimation.

Original languageEnglish
Pages (from-to)312-348
Number of pages37
JournalStochastic Processes and their Applications
Volume156
DOIs
Publication statusPublished - 1 Feb 2023
Externally publishedYes

Keywords

  • Averaging
  • Optimization
  • Stochastic Gradient Descent

Fingerprint

Dive into the research topics of 'Optimal non-asymptotic analysis of the Ruppert–Polyak averaging stochastic algorithm'. Together they form a unique fingerprint.

Cite this