Scoring anomalies: A M-estimation formulation

Stéphan Clémençon, Jérémie Jakubowicz

Research output: Contribution to journalConference articlepeer-review

Abstract

It is the purpose of this paper to formu- late the issue of scoring multivariate observa- tions depending on their degree of abnormal- ity/novelty as an unsupervised learning task. Whereas in the 1-d situation, this problem can be dealt with by means of tail estima- tion techniques, observations being viewed as all the more "abnormal" as they are located far in the tail(s) of the underlying probabil- ity distribution. In a wide variety of appli- cations, it is desirable to dispose of a scalar valued "scoring" function allowing for com- paring the degree of abnormality of multi- variate observations. Here we formulate the issue of scoring anomalies as a M-estimation problem. A (functional) performance crite- rion is proposed, whose optimal elements are, as expected, nondecreasing transforms of the density. The question of empirical estima- tion of this criterion is tackled and prelimi- nary statistical results related to the accuracy of partition-based techniques for optimizing empirical estimates of the empirical perfor- mance measure are established.

Original languageEnglish
Pages (from-to)659-667
Number of pages9
JournalJournal of Machine Learning Research
Volume31
Publication statusPublished - 1 Jan 2013
Externally publishedYes
Event16th International Conference on Artificial Intelligence and Statistics, AISTATS 2013 - Scottsdale, United States
Duration: 29 Apr 20131 May 2013

Fingerprint

Dive into the research topics of 'Scoring anomalies: A M-estimation formulation'. Together they form a unique fingerprint.

Cite this