TY - GEN
T1 - Weakly Supervised Short Text Categorization Using World Knowledge
AU - Türker, Rima
AU - Zhang, Lei
AU - Alam, Mehwish
AU - Sack, Harald
N1 - Publisher Copyright:
© 2020, Springer Nature Switzerland AG.
PY - 2020/1/1
Y1 - 2020/1/1
N2 - Short text categorization is an important task in many NLP applications, such as sentiment analysis, news feed categorization, etc. Due to the sparsity and shortness of the text, many traditional classification models perform poorly if they are directly applied to short text. Moreover, supervised approaches require large amounts of manually labeled data, which is a costly, labor intensive, and time-consuming task. This paper proposes a weakly supervised short text categorization approach, which does not require any manually labeled data. The proposed model consists of two main modules: (1) a data labeling module, which leverages an external Knowledge Base (KB) to compute probabilistic labels for a given unlabeled training data set, and (2) a classification model based on a Wide & Deep learning approach. The effectiveness of the proposed method is validated via evaluation on multiple datasets. The experimental results show that the proposed approach outperforms unsupervised state-of-the-art classification approaches and achieves comparable performance to supervised approaches.
AB - Short text categorization is an important task in many NLP applications, such as sentiment analysis, news feed categorization, etc. Due to the sparsity and shortness of the text, many traditional classification models perform poorly if they are directly applied to short text. Moreover, supervised approaches require large amounts of manually labeled data, which is a costly, labor intensive, and time-consuming task. This paper proposes a weakly supervised short text categorization approach, which does not require any manually labeled data. The proposed model consists of two main modules: (1) a data labeling module, which leverages an external Knowledge Base (KB) to compute probabilistic labels for a given unlabeled training data set, and (2) a classification model based on a Wide & Deep learning approach. The effectiveness of the proposed method is validated via evaluation on multiple datasets. The experimental results show that the proposed approach outperforms unsupervised state-of-the-art classification approaches and achieves comparable performance to supervised approaches.
KW - Short text categorization
KW - Weakly supervised short text categorization
KW - Wide & Deep model
U2 - 10.1007/978-3-030-62419-4_33
DO - 10.1007/978-3-030-62419-4_33
M3 - Conference contribution
AN - SCOPUS:85096617959
SN - 9783030624187
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 584
EP - 600
BT - The Semantic Web – ISWC 2020 - 19th International Semantic Web Conference, 2020, Proceedings
A2 - Pan, Jeff Z.
A2 - Tamma, Valentina
A2 - d’Amato, Claudia
A2 - Janowicz, Krzysztof
A2 - Fu, Bo
A2 - Polleres, Axel
A2 - Seneviratne, Oshani
A2 - Kagal, Lalana
PB - Springer Science and Business Media Deutschland GmbH
T2 - 19th International Semantic Web Conference, ISWC 2020
Y2 - 2 November 2020 through 6 November 2020
ER -