Passer à la navigation principale Passer à la recherche Passer au contenu principal

Adaptive Window Strategy for Topic Modeling in Document Streams

  • Pierre Alexandre Murena
  • , Marie Al-Ghossein
  • , Talel Abdessalem
  • , Antoine Cornuejols

Résultats de recherche: Le chapitre dans un livre, un rapport, une anthologie ou une collectionContribution à une conférenceRevue par des pairs

Résumé

Extracting global themes from a written text has recently become a major issue for computational intelligence, in particular in Natural Language Processing communities. Among all proposed solutions, Latent Dirichlet Allocation (LDA) has gained a vast interest and several variants have been proposed to adapt to changing environments. With the emergence of data streams, for instance from social media, the domain faces a new challenge: Topic extraction in real time. In this paper, we propose a simple approach called Adaptive Window based Incremental LDA (AWILDA) originating from the cross-over between LDA and state-of-the-art methods in data stream mining. We train new topic models only when a drift is detected and select training data on the fly using ADWIN algorithm. We provide both theoretical guarantees for our method and experimental validation on artificial and real-world data.

langue originaleAnglais
titre2018 International Joint Conference on Neural Networks, IJCNN 2018 - Proceedings
EditeurInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronique)9781509060146
Les DOIs
étatPublié - 10 oct. 2018
Modification externeOui
Evénement2018 International Joint Conference on Neural Networks, IJCNN 2018 - Rio de Janeiro, Brésil
Durée: 8 juil. 201813 juil. 2018

Série de publications

NomProceedings of the International Joint Conference on Neural Networks
Volume2018-July

Une conférence

Une conférence2018 International Joint Conference on Neural Networks, IJCNN 2018
Pays/TerritoireBrésil
La villeRio de Janeiro
période8/07/1813/07/18

Empreinte digitale

Examiner les sujets de recherche de « Adaptive Window Strategy for Topic Modeling in Document Streams ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation