Skip to main navigation Skip to search Skip to main content

Harder, better, faster, stronger Convergence rates for least-squares regression

Research output: Contribution to journalArticlepeer-review

Abstract

We consider the optimization of a quadratic objective function whose gradients are only accessible through a stochastic oracle that returns the gradient at any given point plus a zero-mean finite variance random error. We present the first algorithm that achieves jointly the optimal prediction error rates for least-squares regression, both in terms of forgetting the initial conditions in O(1/n2), and in terms of dependence on the noise and dimension d of the problem, as O(d/n). Our new algorithm is based on averaged accelerated regularized gradient descent, and may also be analyzed through finer assumptions on initial conditions and the Hessian matrix, leading to dimension-free quantities that may still be small in some distances while the “optimal” terms above are large. In order to characterize the tightness of these new bounds, we consider an application to non-parametric regression and use the known lower bounds on the statistical performance (without computational limits), which happen to match our bounds obtained from a single pass on the data and thus show optimality of our algorithm in a wide variety of particular trade-offs between bias and variance.

Original languageEnglish
Pages (from-to)1-51
Number of pages51
JournalJournal of Machine Learning Research
Volume18
Publication statusPublished - 1 Oct 2017

Keywords

  • Accelerated gradient
  • Convex optimization
  • Least-squares regression
  • Non-parametric estimation
  • Stochastic gradient

Fingerprint

Dive into the research topics of 'Harder, better, faster, stronger Convergence rates for least-squares regression'. Together they form a unique fingerprint.

Cite this