TY - GEN
T1 - Harnessing truth discovery algorithms on the topic labelling problem
AU - Sanjaya Er, Ngurah Agus
AU - Abdessalem, Talel
AU - Lamine Ba, Mouhamadou
AU - Bressan, Stéphane
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/11/19
Y1 - 2018/11/19
N2 - Topics in topic modelling approaches are represented as a collection of weighted words. The labels for the topics, however, are not clearly defined and must be interpreted manually. Topic labelling proposes to automatically label the topics by leveraging a knowledge base or applying data mining and machine learning algorithms. We propose a naive topic labelling approach where we transform the labeling problem into selecting the best label for each word in the topic. The candidate labels are generated by querying a knowledge base using the top-N words of each topic. We construct a heterogeneous graph of topics, words, articles and candidate labels. To rank the candidate labels, we apply truth discovery algorithms on the graph. The performance evaluation using popular topic modelling datasets shows that the approach receives satisfactory accuracy.
AB - Topics in topic modelling approaches are represented as a collection of weighted words. The labels for the topics, however, are not clearly defined and must be interpreted manually. Topic labelling proposes to automatically label the topics by leveraging a knowledge base or applying data mining and machine learning algorithms. We propose a naive topic labelling approach where we transform the labeling problem into selecting the best label for each word in the topic. The candidate labels are generated by querying a knowledge base using the top-N words of each topic. We construct a heterogeneous graph of topics, words, articles and candidate labels. To rank the candidate labels, we apply truth discovery algorithms on the graph. The performance evaluation using popular topic modelling datasets shows that the approach receives satisfactory accuracy.
KW - Evaluation
KW - Ranking
KW - Topic labelling
KW - Truth discovery
KW - Truth finding
UR - https://www.scopus.com/pages/publications/85061156961
U2 - 10.1145/3282373.3282390
DO - 10.1145/3282373.3282390
M3 - Conference contribution
AN - SCOPUS:85061156961
T3 - ACM International Conference Proceeding Series
SP - 8
EP - 14
BT - 20th International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2018 - Proceedings
A2 - Anderst-Kotsis, Gabriele
A2 - Pardede, Eric
A2 - Steinbauer, Matthias
A2 - Indrawan-Santiago, Maria
A2 - Salvadori, Ivan Luiz
A2 - Salvadori, Ivan Luiz
A2 - Khalil, Ismail
PB - Association for Computing Machinery
T2 - 20th International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2018
Y2 - 19 November 2018 through 21 November 2018
ER -