Regularizing text categorization with clusters of words

Konstantinos Skianis, François Rousseau, Michalis Vazirgiannis

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Regularization is a critical step in supervised learning to not only address overfitting, but also to take into account any prior knowledge we may have on the features and their dependence. In this paper, we explore state-of-the-art structured regularizers and we propose novel ones based on clusters of words from LSI topics, word2vec embeddings and graph-of-words document representation. We show that our proposed regularizers are faster than the state-of-the-art ones and still improve text classification accuracy. Code and data are available online.

Original languageEnglish
Title of host publicationEMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings
PublisherAssociation for Computational Linguistics (ACL)
Pages1827-1837
Number of pages11
ISBN (Electronic)9781945626258
DOIs
Publication statusPublished - 1 Jan 2016
Event2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016 - Austin, United States
Duration: 1 Nov 20165 Nov 2016

Publication series

NameEMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings

Conference

Conference2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016
Country/TerritoryUnited States
CityAustin
Period1/11/165/11/16

Fingerprint

Dive into the research topics of 'Regularizing text categorization with clusters of words'. Together they form a unique fingerprint.

Cite this