Passer à la navigation principale Passer à la recherche Passer au contenu principal

Generalization Bounds in the Presence of Outliers: a Median-of-Means Study

  • University of Milano
  • Institut Polytechnique de Paris

Résultats de recherche: Le chapitre dans un livre, un rapport, une anthologie ou une collectionContribution à une conférenceRevue par des pairs

Résumé

In contrast to the empirical mean, the Median-of-Means (MoM) is an estimator of the mean θ of a square integrable r.v. Z, around which accurate nonasymptotic confidence bounds can be built, even when Z does not exhibit a sub-Gaussian tail behavior. Thanks to the high confidence it achieves on heavy-tailed data, MoM has found various applications in machine learning, where it is used to design training procedures that are not sensitive to atypical observations. More recently, a new line of work is now trying to characterize and leverage MoM's ability to deal with corrupted data. In this context, the present work proposes a general study of MoM's concentration properties under the contamination regime, that provides a clear understanding of the impact of the outlier proportion and the number of blocks chosen. The analysis is extended to (multisample) U-statistics, i.e. averages over tuples of observations, that raise additional challenges due to the dependence induced. Finally, we show that the latter bounds can be used in a straightforward fashion to derive generalization guarantees for pairwise learning in a contaminated setting, and propose an algorithm to compute provably reliable decision functions.

langue originaleAnglais
titreProceedings of the 38th International Conference on Machine Learning, ICML 2021
EditeurML Research Press
Pages5937-5947
Nombre de pages11
ISBN (Electronique)9781713845065
étatPublié - 1 janv. 2021
Evénement38th International Conference on Machine Learning, ICML 2021 - Virtual, Online
Durée: 18 juil. 202124 juil. 2021

Série de publications

NomProceedings of Machine Learning Research
Volume139
ISSN (Electronique)2640-3498

Une conférence

Une conférence38th International Conference on Machine Learning, ICML 2021
La villeVirtual, Online
période18/07/2124/07/21

Empreinte digitale

Examiner les sujets de recherche de « Generalization Bounds in the Presence of Outliers: a Median-of-Means Study ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation