Passer à la navigation principale Passer à la recherche Passer au contenu principal

C-SMOTE: Continuous Synthetic Minority Oversampling for Evolving Data Streams

  • Alessio Bernardo
  • , Heitor Murilo Gomes
  • , Jacob Montiel
  • , Bernhard Pfahringer
  • , Albert Bifet
  • , Emanuele Della Valle
  • Politecnico di Milano
  • University of Waikato

Résultats de recherche: Le chapitre dans un livre, un rapport, une anthologie ou une collectionContribution à une conférenceRevue par des pairs

Résumé

Streaming Machine Learning (SML) studies single-pass learning algorithms that update their models one data item at a time given an unbounded and often non-stationary flow of data (a.k.a., in presence of concept drift). Online class imbalance learning is a branch of SML that combines the challenges of both class imbalance and concept drift. In this paper, we investigate the binary classification problem of rebalancing an imbalanced stream of data in the presence of concept drift, accessing one sample at a time. We propose Continuous Synthetic Minority Oversampling Technique (C-SMOTE), a novel rebalancing meta-strategy to pipeline with SML classification algorithms. C-SMOTE is inspired by the popular SMOTE algorithm but operates continuously. We benchmark C-SMOTE pipelines on ten different groups of data streams. We bring empirical evidence that models learnt with C-SMOTE pipelines outperform models trained on imbalanced data stream without losing the ability to deal with concept drifts. Moreover, we show that they outperform other stream balancing techniques from the literature.

langue originaleAnglais
titreProceedings - 2020 IEEE International Conference on Big Data, Big Data 2020
rédacteurs en chefXintao Wu, Chris Jermaine, Li Xiong, Xiaohua Tony Hu, Olivera Kotevska, Siyuan Lu, Weijia Xu, Srinivas Aluru, Chengxiang Zhai, Eyhab Al-Masri, Zhiyuan Chen, Jeff Saltz
EditeurInstitute of Electrical and Electronics Engineers Inc.
Pages483-492
Nombre de pages10
ISBN (Electronique)9781728162515
Les DOIs
étatPublié - 10 déc. 2020
Modification externeOui
Evénement8th IEEE International Conference on Big Data, Big Data 2020 - Virtual, Online, États-Unis
Durée: 10 déc. 202013 déc. 2020

Série de publications

NomProceedings - 2020 IEEE International Conference on Big Data, Big Data 2020

Une conférence

Une conférence8th IEEE International Conference on Big Data, Big Data 2020
Pays/TerritoireÉtats-Unis
La villeVirtual, Online
période10/12/2013/12/20

Empreinte digitale

Examiner les sujets de recherche de « C-SMOTE: Continuous Synthetic Minority Oversampling for Evolving Data Streams ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation