TY - GEN
T1 - Random histogram forest for unsupervised anomaly detection
AU - Putina, Andrian
AU - Sozio, Mauro
AU - Rossi, Dario
AU - Navarro, Jose M.
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/11/1
Y1 - 2020/11/1
N2 - Roughly speaking, anomaly detection consists of identifying instances whose features significantly deviate from the rest of input data. It is one of the most widely studied problems in unsupervised machine learning, boasting applications in network intrusion detection, healthcare and many others. Several methods have been developed in recent years, however, a satisfactory solution is still missing to the best of our knowledge. We present Random Histogram Forest an effective approach for unsupervised anomaly detection. Our approach is probabilistic, which has been proved to be effective in identifying anomalies. Moreover, it employs the fourth central moment (aka kurtosis), so as to identify potential anomalous instances. We conduct an extensive experimental evaluation on 38 datasets including all benchmarks for anomaly detection, as well as the most successful algorithms for unsupervised anomaly detection, to the best of our knowledge. We evaluate all the approaches in terms of the average precision of the area under the precision-recall curve (AP). Our evaluation shows that our approach significantly outperforms all other approaches in terms of AP while boasting linear running time.
AB - Roughly speaking, anomaly detection consists of identifying instances whose features significantly deviate from the rest of input data. It is one of the most widely studied problems in unsupervised machine learning, boasting applications in network intrusion detection, healthcare and many others. Several methods have been developed in recent years, however, a satisfactory solution is still missing to the best of our knowledge. We present Random Histogram Forest an effective approach for unsupervised anomaly detection. Our approach is probabilistic, which has been proved to be effective in identifying anomalies. Moreover, it employs the fourth central moment (aka kurtosis), so as to identify potential anomalous instances. We conduct an extensive experimental evaluation on 38 datasets including all benchmarks for anomaly detection, as well as the most successful algorithms for unsupervised anomaly detection, to the best of our knowledge. We evaluate all the approaches in terms of the average precision of the area under the precision-recall curve (AP). Our evaluation shows that our approach significantly outperforms all other approaches in terms of AP while boasting linear running time.
KW - N/a
U2 - 10.1109/ICDM50108.2020.00154
DO - 10.1109/ICDM50108.2020.00154
M3 - Conference contribution
AN - SCOPUS:85100883371
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 1226
EP - 1231
BT - Proceedings - 20th IEEE International Conference on Data Mining, ICDM 2020
A2 - Plant, Claudia
A2 - Wang, Haixun
A2 - Cuzzocrea, Alfredo
A2 - Zaniolo, Carlo
A2 - Wu, Xindong
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 20th IEEE International Conference on Data Mining, ICDM 2020
Y2 - 17 November 2020 through 20 November 2020
ER -