Skip to main navigation Skip to search Skip to main content

C-SMOTE: Continuous Synthetic Minority Oversampling for Evolving Data Streams

  • Alessio Bernardo
  • , Heitor Murilo Gomes
  • , Jacob Montiel
  • , Bernhard Pfahringer
  • , Albert Bifet
  • , Emanuele Della Valle
  • Politecnico di Milano
  • University of Waikato

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Streaming Machine Learning (SML) studies single-pass learning algorithms that update their models one data item at a time given an unbounded and often non-stationary flow of data (a.k.a., in presence of concept drift). Online class imbalance learning is a branch of SML that combines the challenges of both class imbalance and concept drift. In this paper, we investigate the binary classification problem of rebalancing an imbalanced stream of data in the presence of concept drift, accessing one sample at a time. We propose Continuous Synthetic Minority Oversampling Technique (C-SMOTE), a novel rebalancing meta-strategy to pipeline with SML classification algorithms. C-SMOTE is inspired by the popular SMOTE algorithm but operates continuously. We benchmark C-SMOTE pipelines on ten different groups of data streams. We bring empirical evidence that models learnt with C-SMOTE pipelines outperform models trained on imbalanced data stream without losing the ability to deal with concept drifts. Moreover, we show that they outperform other stream balancing techniques from the literature.

Original languageEnglish
Title of host publicationProceedings - 2020 IEEE International Conference on Big Data, Big Data 2020
EditorsXintao Wu, Chris Jermaine, Li Xiong, Xiaohua Tony Hu, Olivera Kotevska, Siyuan Lu, Weijia Xu, Srinivas Aluru, Chengxiang Zhai, Eyhab Al-Masri, Zhiyuan Chen, Jeff Saltz
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages483-492
Number of pages10
ISBN (Electronic)9781728162515
DOIs
Publication statusPublished - 10 Dec 2020
Externally publishedYes
Event8th IEEE International Conference on Big Data, Big Data 2020 - Virtual, Online, United States
Duration: 10 Dec 202013 Dec 2020

Publication series

NameProceedings - 2020 IEEE International Conference on Big Data, Big Data 2020

Conference

Conference8th IEEE International Conference on Big Data, Big Data 2020
Country/TerritoryUnited States
CityVirtual, Online
Period10/12/2013/12/20

Keywords

  • Balancing
  • Binary Classification
  • Concept Drift
  • Streaming data

Fingerprint

Dive into the research topics of 'C-SMOTE: Continuous Synthetic Minority Oversampling for Evolving Data Streams'. Together they form a unique fingerprint.

Cite this