TY - GEN
T1 - Regularizing text categorization with clusters of words
AU - Skianis, Konstantinos
AU - Rousseau, François
AU - Vazirgiannis, Michalis
N1 - Publisher Copyright:
© 2016 Association for Computational Linguistics
PY - 2016/1/1
Y1 - 2016/1/1
N2 - Regularization is a critical step in supervised learning to not only address overfitting, but also to take into account any prior knowledge we may have on the features and their dependence. In this paper, we explore state-of-the-art structured regularizers and we propose novel ones based on clusters of words from LSI topics, word2vec embeddings and graph-of-words document representation. We show that our proposed regularizers are faster than the state-of-the-art ones and still improve text classification accuracy. Code and data are available online.
AB - Regularization is a critical step in supervised learning to not only address overfitting, but also to take into account any prior knowledge we may have on the features and their dependence. In this paper, we explore state-of-the-art structured regularizers and we propose novel ones based on clusters of words from LSI topics, word2vec embeddings and graph-of-words document representation. We show that our proposed regularizers are faster than the state-of-the-art ones and still improve text classification accuracy. Code and data are available online.
U2 - 10.18653/v1/d16-1188
DO - 10.18653/v1/d16-1188
M3 - Conference contribution
AN - SCOPUS:85058038526
T3 - EMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings
SP - 1827
EP - 1837
BT - EMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings
PB - Association for Computational Linguistics (ACL)
T2 - 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016
Y2 - 1 November 2016 through 5 November 2016
ER -