TY - GEN
T1 - Stream Clustering Robust to Concept Drift
AU - Iglesias, Félix
AU - Konzett, Simon
AU - Zseby, Tanja
AU - Bifet, Albert
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025/1/1
Y1 - 2025/1/1
N2 - Data streams are everywhere in modern technologies, spanning from industrial process control to network traffic analysis. Stream clustering is required to describe data streams in real time and maintain accurate knowledge of their underlying structures. However, data streams frequently exhibit nonstationarity, changes in distributions, and the emergence of new classes. These alterations-commonly referred to as “concept drift”-severely disturb algorithms, resulting in inconsistent outcomes and models. We present SDOstreamclust, an incremental algorithm for stream clustering. It inherits the distinctive features of methods founded on Sparse Data Observers, i.e., lightweight, intuitive, self-adjusting, resistant to noise, capable of identifying non-convex clusters, and constructed upon robust parameters and interpretable models. We compare SDOstreamclust with established algorithms and evaluate them with a broad collection of datasets, both real and synthetic. SDOstreamclust shows outstanding performances, a major adaptability to concept drift, and a superior parameter stability and robustness. Often ignored in the evaluation of new methods, concept drift is a major challenge for next-generation algorithms, since it is inherent to evolving data and a main cause of degradation in machine learning. Hence, SDOstreamclust emerges as a major alternative for unsupervised streaming data analysis.
AB - Data streams are everywhere in modern technologies, spanning from industrial process control to network traffic analysis. Stream clustering is required to describe data streams in real time and maintain accurate knowledge of their underlying structures. However, data streams frequently exhibit nonstationarity, changes in distributions, and the emergence of new classes. These alterations-commonly referred to as “concept drift”-severely disturb algorithms, resulting in inconsistent outcomes and models. We present SDOstreamclust, an incremental algorithm for stream clustering. It inherits the distinctive features of methods founded on Sparse Data Observers, i.e., lightweight, intuitive, self-adjusting, resistant to noise, capable of identifying non-convex clusters, and constructed upon robust parameters and interpretable models. We compare SDOstreamclust with established algorithms and evaluate them with a broad collection of datasets, both real and synthetic. SDOstreamclust shows outstanding performances, a major adaptability to concept drift, and a superior parameter stability and robustness. Often ignored in the evaluation of new methods, concept drift is a major challenge for next-generation algorithms, since it is inherent to evolving data and a main cause of degradation in machine learning. Hence, SDOstreamclust emerges as a major alternative for unsupervised streaming data analysis.
KW - concept drift
KW - stream clustering
KW - streaming data analysis
UR - https://www.scopus.com/pages/publications/105023964974
U2 - 10.1109/IJCNN64981.2025.11227664
DO - 10.1109/IJCNN64981.2025.11227664
M3 - Conference contribution
AN - SCOPUS:105023964974
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - International Joint Conference on Neural Networks, IJCNN 2025 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 International Joint Conference on Neural Networks, IJCNN 2025
Y2 - 30 June 2025 through 5 July 2025
ER -