Passer à la navigation principale Passer à la recherche Passer au contenu principal

A Sketch-Based Naive Bayes Algorithms for Evolving Data Streams

  • Université Paris-Saclay
  • INRIA Saclay, Laboratoire de Recherche en Informatique (LRI), Université Paris Sud

Résultats de recherche: Le chapitre dans un livre, un rapport, une anthologie ou une collectionContribution à une conférenceRevue par des pairs

Résumé

A well-known learning task in big data stream mining is classification. Extensively studied in the offline setting, in the streaming setting - where data are evolving and even infinite - it is still a challenge. In the offline setting, training needs to store all the data in memory for the learning task; yet, in the streaming setting, this is impossible to do due to the massive amount of data that is generated in real-time. To cope with these resource issues, this paper proposes and analyzes several evolving naive Bayes classification algorithms, based on the well-known count-min sketch, in order to minimize the space needed to store the training data. The proposed algorithms also adapt concept drift approaches, such as ADWIN, to deal with the fact that streaming data may be evolving and change over time. However, handling sparse, very high-dimensional data in such framework is highly challenging. Therefore, we include the hashing trick, a technique for dimensionality reduction, to compress that down to a lower dimensional space, which leads to a large memory saving.We give a theoretical analysis which demonstrates that our proposed algorithms provide a similar accuracy quality to the classical big data stream mining algorithms using a reasonable amount of resources. We validate these theoretical results by an extensive evaluation on both synthetic and real-world datasets.

langue originaleAnglais
titreProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018
rédacteurs en chefNaoki Abe, Huan Liu, Calton Pu, Xiaohua Hu, Nesreen Ahmed, Mu Qiao, Yang Song, Donald Kossmann, Bing Liu, Kisung Lee, Jiliang Tang, Jingrui He, Jeffrey Saltz
EditeurInstitute of Electrical and Electronics Engineers Inc.
Pages604-613
Nombre de pages10
ISBN (Electronique)9781538650356
Les DOIs
étatPublié - 2 juil. 2018
Modification externeOui
Evénement2018 IEEE International Conference on Big Data, Big Data 2018 - Seattle, États-Unis
Durée: 10 déc. 201813 déc. 2018

Série de publications

NomProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018

Une conférence

Une conférence2018 IEEE International Conference on Big Data, Big Data 2018
Pays/TerritoireÉtats-Unis
La villeSeattle
période10/12/1813/12/18

Empreinte digitale

Examiner les sujets de recherche de « A Sketch-Based Naive Bayes Algorithms for Evolving Data Streams ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation