TY - GEN
T1 - Tree-based Kendall's τ Maximization for Explainable Unsupervised Anomaly Detection
AU - Kong, Lanfang
AU - Huet, Alexis
AU - Rossi, Dario
AU - Sozio, Mauro
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023/1/1
Y1 - 2023/1/1
N2 - We study the problem of building a regression tree with relatively small size, which maximizes the Kendall's tau coefficient between the anomaly scores of a source anomaly detection algorithm and those predicted by our regression tree. We consider a labeling function which assigns to each leaf the inverse of its size, thereby providing satisfactory explanations when comparing examples with different anomaly scores. We show that our approach can be used as a post-hoc model, i.e. to provide global explanations for an existing anomaly detection algorithm. Moreover, it can be used as an in-model approach, i.e. the source anomaly detection algorithm can be replaced all together. This is made possible by leveraging the off-the-shelf transparency of tree-based approaches and from the fact that the explanations provided by our approach do not rely on the source anomaly detection algorithm. The main technical challenge to tackle is the efficient computation of the Kendall's tau coefficients when determining the best split at each node of the regression tree. We show how such a coefficient can be computed incrementally, thereby making the running time of our algorithm almost linear (up to a logarithmic factor) in the size of the input. Our approach is completely unsupervised, which is appealing in the case when it is difficult to collect a large number of labeled examples. We complement our study with an extensive experimental evaluation against the state-of-the-art, showing the effectiveness of our approach.
AB - We study the problem of building a regression tree with relatively small size, which maximizes the Kendall's tau coefficient between the anomaly scores of a source anomaly detection algorithm and those predicted by our regression tree. We consider a labeling function which assigns to each leaf the inverse of its size, thereby providing satisfactory explanations when comparing examples with different anomaly scores. We show that our approach can be used as a post-hoc model, i.e. to provide global explanations for an existing anomaly detection algorithm. Moreover, it can be used as an in-model approach, i.e. the source anomaly detection algorithm can be replaced all together. This is made possible by leveraging the off-the-shelf transparency of tree-based approaches and from the fact that the explanations provided by our approach do not rely on the source anomaly detection algorithm. The main technical challenge to tackle is the efficient computation of the Kendall's tau coefficients when determining the best split at each node of the regression tree. We show how such a coefficient can be computed incrementally, thereby making the running time of our algorithm almost linear (up to a logarithmic factor) in the size of the input. Our approach is completely unsupervised, which is appealing in the case when it is difficult to collect a large number of labeled examples. We complement our study with an extensive experimental evaluation against the state-of-the-art, showing the effectiveness of our approach.
KW - n/a
U2 - 10.1109/ICDM58522.2023.00126
DO - 10.1109/ICDM58522.2023.00126
M3 - Conference contribution
AN - SCOPUS:85185403303
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 1073
EP - 1078
BT - Proceedings - 23rd IEEE International Conference on Data Mining, ICDM 2023
A2 - Chen, Guihai
A2 - Khan, Latifur
A2 - Gao, Xiaofeng
A2 - Qiu, Meikang
A2 - Pedrycz, Witold
A2 - Wu, Xindong
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 23rd IEEE International Conference on Data Mining, ICDM 2023
Y2 - 1 December 2023 through 4 December 2023
ER -