Abstract
It is the purpose of this paper to formu- late the issue of scoring multivariate observa- tions depending on their degree of abnormal- ity/novelty as an unsupervised learning task. Whereas in the 1-d situation, this problem can be dealt with by means of tail estima- tion techniques, observations being viewed as all the more "abnormal" as they are located far in the tail(s) of the underlying probabil- ity distribution. In a wide variety of appli- cations, it is desirable to dispose of a scalar valued "scoring" function allowing for com- paring the degree of abnormality of multi- variate observations. Here we formulate the issue of scoring anomalies as a M-estimation problem. A (functional) performance crite- rion is proposed, whose optimal elements are, as expected, nondecreasing transforms of the density. The question of empirical estima- tion of this criterion is tackled and prelimi- nary statistical results related to the accuracy of partition-based techniques for optimizing empirical estimates of the empirical perfor- mance measure are established.
| Original language | English |
|---|---|
| Pages (from-to) | 659-667 |
| Number of pages | 9 |
| Journal | Journal of Machine Learning Research |
| Volume | 31 |
| Publication status | Published - 1 Jan 2013 |
| Externally published | Yes |
| Event | 16th International Conference on Artificial Intelligence and Statistics, AISTATS 2013 - Scottsdale, United States Duration: 29 Apr 2013 → 1 May 2013 |