MONK – Outlier-Robust Mean Embedding Estimation by Median-of-Means

Research output: Contribution to journalConference articlepeer-review

Abstract

Mean embeddings provide an extremely flexible and powerful tool in machine learning and statistics to represent probability distributions and define a semi-metric (MMD, maximum mean discrepancy; also called N-distance or energy distance), with numerous successful applications. The representation is constructed as the expectation of the feature map defined by a kernel. As a mean, its classical empirical estimator, however, can be arbitrary severely affected even by a single outlier in case of unbounded features. To the best of our knowledge, unfortunately even the consistency of the existing few techniques trying to alleviate this serious sensitivity bottleneck is unknown. In this paper, we show how the recently emerged principle of median-of-means can be used to design estimators for kernel mean embedding and MMD with excessive resistance properties to outliers, and optimal sub-Gaussian deviation bounds under mild assumptions.

Original languageEnglish
Pages (from-to)3782-3793
Number of pages12
JournalProceedings of Machine Learning Research
Volume97
Publication statusPublished - 1 Jan 2019
Externally publishedYes
Event36th International Conference on Machine Learning, ICML 2019 - Long Beach, United States
Duration: 9 Jun 201915 Jun 2019

Fingerprint

Dive into the research topics of 'MONK – Outlier-Robust Mean Embedding Estimation by Median-of-Means'. Together they form a unique fingerprint.

Cite this