Online Clustering: Algorithms, Evaluation, Metrics, Applications and Benchmarking

  • Jacob Montiel
  • , Hoang Anh Ngo
  • , Minh Huong Le-Nguyen
  • , Albert Bifet

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Online clustering algorithms play a critical role in data science, especially with the advantages regarding time, memory usage and complexity, while maintaining a high performance compared to traditional clustering methods. This tutorial serves, first, as a survey on online machine learning and, in particular, data stream clustering methods. During this tutorial, state-of-the-art algorithms and the associated core research threads will be presented by identifying different categories based on distance, density grids and hidden statistical models. Clustering validity indices, an important part of the clustering process which are usually neglected or replaced with classification metrics, resulting in misleading interpretation of final results, will also be deeply investigated. Then, this introduction will be put into the context with River, a go-to Python library merged between Creme and scikit-multiflow. It is also the first open-source project to include an online clustering module that can facilitate reproducibility and allow direct further improvements. From this, we propose methods of clustering configuration, applications and settings for benchmarking, using real-world problems and datasets.

Original languageEnglish
Title of host publicationKDD 2022 - Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages4808-4809
Number of pages2
ISBN (Electronic)9781450393850
DOIs
Publication statusPublished - 14 Aug 2022
Event28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2022 - Washington, United States
Duration: 14 Aug 202218 Aug 2022

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Conference

Conference28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2022
Country/TerritoryUnited States
CityWashington
Period14/08/2218/08/22

Keywords

  • benchmarking
  • data streams
  • decision support
  • online clustering
  • stream clustering
  • stream learning

Fingerprint

Dive into the research topics of 'Online Clustering: Algorithms, Evaluation, Metrics, Applications and Benchmarking'. Together they form a unique fingerprint.

Cite this