Passer à la navigation principale Passer à la recherche Passer au contenu principal

Sampling and empirical risk minimization

  • Université Paris-Saclay
  • Université Paris-Nanterre
  • Centre de Géosciences

Résultats de recherche: Contribution à un journalArticleRevue par des pairs

Résumé

In certain situations that shall be undoubtedly more and more common in the Big Data era, the datasets available are so massive that computing statistics over the full samples is hardly feasible, if not unfeasible. A natural approach in this context consists in using survey schemes and substituting the ‘full data’ statistics with their counterparts based on the resulting random samples, of manageable size. It is the main purpose of this paper to investigate the impact of survey sampling on statistical learning methods based on empirical risk minimization through the standard binary classification problem, considered here as a ‘case in point’. Precisely, we prove that, in presence of auxiliary information, appropriate use of optimally coupled Poisson survey plans may not affect much the learning rates, while possibly reducing significantly the number of terms that must be averaged to compute the empirical risk functional with overwhelming probability. These striking results are next shown to extend to more general sampling schemes by means of a coupling technique, originally introduced by Hajek [Asymptotic theory of rejective sampling with varying probabilities from a finite population. Ann Math Stat. 1964;35(4):1491–1523].

langue originaleAnglais
Pages (de - à)30-42
Nombre de pages13
journalStatistics
Volume51
Numéro de publication1
Les DOIs
étatPublié - 2 janv. 2017
Modification externeOui

Empreinte digitale

Examiner les sujets de recherche de « Sampling and empirical risk minimization ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation