NONASYMPTOTIC ANALYSIS OF STOCHASTIC GRADIENT DESCENT WITH THE RICHARDSON-ROMBERG EXTRAPOLATION

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We address the problem of solving strongly convex and smooth minimization problems using stochastic gradient descent (SGD) algorithm with a constant step size. Previous works suggested to combine the Polyak-Ruppert averaging procedure with the Richardson-Romberg extrapolation to reduce the asymptotic bias of SGD at the expense of a mild increase of the variance. We significantly extend previous results by providing an expansion of the mean-squared error of the resulting estimator with respect to the number of iterations n. We show that the root mean-squared error can be decomposed into the sum of two terms: a leading one of order O(n−1/2) with explicit dependence on a minimax-optimal asymptotic covariance matrix, and a second-order term of order O(n−3/4), where the power 3/4 is best known. We also extend this result to the higher-order moment bounds. Our analysis relies on the properties of the SGD iterates viewed as a time-homogeneous Markov chain. In particular, we establish that this chain is geometrically ergodic with respect to a suitably defined weighted Wasserstein semimetric.

Original languageEnglish
Title of host publication13th International Conference on Learning Representations, ICLR 2025
PublisherInternational Conference on Learning Representations, ICLR
Pages64100-64130
Number of pages31
ISBN (Electronic)9798331320850
Publication statusPublished - 1 Jan 2025
Event13th International Conference on Learning Representations, ICLR 2025 - Singapore, Singapore
Duration: 24 Apr 202528 Apr 2025

Publication series

Name13th International Conference on Learning Representations, ICLR 2025

Conference

Conference13th International Conference on Learning Representations, ICLR 2025
Country/TerritorySingapore
CitySingapore
Period24/04/2528/04/25

Fingerprint

Dive into the research topics of 'NONASYMPTOTIC ANALYSIS OF STOCHASTIC GRADIENT DESCENT WITH THE RICHARDSON-ROMBERG EXTRAPOLATION'. Together they form a unique fingerprint.

Cite this