Scaling up M-estimation via sampling designs: The Horvitz-Thompson stochastic gradient descent

Stephan Clemencon, Patrice Bertail, Emilie Chautru

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In certain situations that shall be undoubtedly more and more common in the Big Data era, the datasets available are so massive that computing statistics over the full sample is hardly feasible, if not unfeasible. A natural approach in this context consists in using survey schemes and substituting the 'full data' statistics with their counterparts based on the resulting random samples, of manageable size. It is the purpose of this paper to investigate the impact of survey sampling with unequal inclusion probabilities on (stochastic) gradient descent-based M-estimation methods in large-scale statistical-learning problems. We prove that, in presence of some a priori information, one may significantly reduce the number of terms that must be averaged to estimate the gradient at each step with overwhelming probability, while preserving the asymptotic accuracy. These striking results are described here by limit theorems.

Original languageEnglish
Title of host publicationProceedings - 2014 IEEE International Conference on Big Data, Big Data 2014
EditorsJimmy Lin, Jian Pei, Xiaohua Tony Hu, Wo Chang, Raghunath Nambiar, Charu Aggarwal, Nick Cercone, Vasant Honavar, Jun Huan, Bamshad Mobasher, Saumyadipta Pyne
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages25-30
Number of pages6
ISBN (Electronic)9781479956654
DOIs
Publication statusPublished - 1 Jan 2014
Externally publishedYes
Event2nd IEEE International Conference on Big Data, Big Data 2014 - Washington, United States
Duration: 27 Oct 201430 Oct 2014

Publication series

NameProceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014

Conference

Conference2nd IEEE International Conference on Big Data, Big Data 2014
Country/TerritoryUnited States
CityWashington
Period27/10/1430/10/14

Keywords

  • Horvitz-Thompson estimation
  • sampling design
  • statistical learning
  • stochastic gradient descent
  • survey

Fingerprint

Dive into the research topics of 'Scaling up M-estimation via sampling designs: The Horvitz-Thompson stochastic gradient descent'. Together they form a unique fingerprint.

Cite this