Robust machine learning by median-of-means: Theory and practice

Research output: Contribution to journalArticlepeer-review

Abstract

Median-of-means (MOM) based procedures have been recently introduced in learning theory (Lugosi and Mendelson (2019); Lecué and Lerasle (2017)). These estimators outperform classical least-squares estimators when data are heavy-tailed and/or are corrupted. None of these procedures can be implemented, which is the major issue of current MOM procedures (Ann. Statist. 47 (2019) 783-794). In this paper, we introduce minmax MOM estimators and show that they achieve the same sub-Gaussian deviation bounds as the alternatives (Lugosi and Mendelson (2019); Lecué and Lerasle (2017)), both in small and high-dimensional statistics. In particular, these estimators are efficient under moments assumptions on data that may have been corrupted by a few outliers. Besides these theoretical guarantees, the definition of minmax MOM estimators suggests simple and systematic modifications of standard algorithms used to approximate least-squares estimators and their regularized versions. As a proof of concept, we perform an extensive simulation study of these algorithms for robust versions of the LASSO.

Original languageEnglish
Pages (from-to)906-931
Number of pages26
JournalAnnals of Statistics
Volume48
Issue number2
DOIs
Publication statusPublished - 1 Jan 2020

Keywords

  • Empirical processes
  • High-dimensional statistics

Fingerprint

Dive into the research topics of 'Robust machine learning by median-of-means: Theory and practice'. Together they form a unique fingerprint.

Cite this