TY - GEN
T1 - Adaptive Window Strategy for Topic Modeling in Document Streams
AU - Murena, Pierre Alexandre
AU - Al-Ghossein, Marie
AU - Abdessalem, Talel
AU - Cornuejols, Antoine
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/10/10
Y1 - 2018/10/10
N2 - Extracting global themes from a written text has recently become a major issue for computational intelligence, in particular in Natural Language Processing communities. Among all proposed solutions, Latent Dirichlet Allocation (LDA) has gained a vast interest and several variants have been proposed to adapt to changing environments. With the emergence of data streams, for instance from social media, the domain faces a new challenge: Topic extraction in real time. In this paper, we propose a simple approach called Adaptive Window based Incremental LDA (AWILDA) originating from the cross-over between LDA and state-of-the-art methods in data stream mining. We train new topic models only when a drift is detected and select training data on the fly using ADWIN algorithm. We provide both theoretical guarantees for our method and experimental validation on artificial and real-world data.
AB - Extracting global themes from a written text has recently become a major issue for computational intelligence, in particular in Natural Language Processing communities. Among all proposed solutions, Latent Dirichlet Allocation (LDA) has gained a vast interest and several variants have been proposed to adapt to changing environments. With the emergence of data streams, for instance from social media, the domain faces a new challenge: Topic extraction in real time. In this paper, we propose a simple approach called Adaptive Window based Incremental LDA (AWILDA) originating from the cross-over between LDA and state-of-the-art methods in data stream mining. We train new topic models only when a drift is detected and select training data on the fly using ADWIN algorithm. We provide both theoretical guarantees for our method and experimental validation on artificial and real-world data.
U2 - 10.1109/IJCNN.2018.8489771
DO - 10.1109/IJCNN.2018.8489771
M3 - Conference contribution
AN - SCOPUS:85056490837
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - 2018 International Joint Conference on Neural Networks, IJCNN 2018 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 International Joint Conference on Neural Networks, IJCNN 2018
Y2 - 8 July 2018 through 13 July 2018
ER -