Weakly Supervised Short Text Categorization Using World Knowledge

Rima Türker, Lei Zhang, Mehwish Alam, Harald Sack

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Short text categorization is an important task in many NLP applications, such as sentiment analysis, news feed categorization, etc. Due to the sparsity and shortness of the text, many traditional classification models perform poorly if they are directly applied to short text. Moreover, supervised approaches require large amounts of manually labeled data, which is a costly, labor intensive, and time-consuming task. This paper proposes a weakly supervised short text categorization approach, which does not require any manually labeled data. The proposed model consists of two main modules: (1) a data labeling module, which leverages an external Knowledge Base (KB) to compute probabilistic labels for a given unlabeled training data set, and (2) a classification model based on a Wide & Deep learning approach. The effectiveness of the proposed method is validated via evaluation on multiple datasets. The experimental results show that the proposed approach outperforms unsupervised state-of-the-art classification approaches and achieves comparable performance to supervised approaches.

Original languageEnglish
Title of host publicationThe Semantic Web – ISWC 2020 - 19th International Semantic Web Conference, 2020, Proceedings
EditorsJeff Z. Pan, Valentina Tamma, Claudia d’Amato, Krzysztof Janowicz, Bo Fu, Axel Polleres, Oshani Seneviratne, Lalana Kagal
PublisherSpringer Science and Business Media Deutschland GmbH
Pages584-600
Number of pages17
ISBN (Print)9783030624187
DOIs
Publication statusPublished - 1 Jan 2020
Externally publishedYes
Event19th International Semantic Web Conference, ISWC 2020 - Athens, Greece
Duration: 2 Nov 20206 Nov 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12506 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference19th International Semantic Web Conference, ISWC 2020
Country/TerritoryGreece
CityAthens
Period2/11/206/11/20

Keywords

  • Short text categorization
  • Weakly supervised short text categorization
  • Wide & Deep model

Fingerprint

Dive into the research topics of 'Weakly Supervised Short Text Categorization Using World Knowledge'. Together they form a unique fingerprint.

Cite this