TY - GEN
T1 - Orthogonal Matching Pursuit for Text Classification
AU - Skianis, Konstantinos
AU - Tziortziotis, Nikolaos
AU - Vazirgiannis, Michalis
N1 - Publisher Copyright:
© 2018 Association for Computational Linguistics.
PY - 2018/1/1
Y1 - 2018/1/1
N2 - In text classification, the problem of overfitting arises due to the high dimensionality, making regularization essential. Although classic regularizers provide sparsity, they fail to return highly accurate models. On the contrary, state-of-the-art group-lasso regularizers provide better results at the expense of low sparsity. In this paper, we apply a greedy variable selection algorithm, called Orthogonal Matching Pursuit, for the text classification task. We also extend standard group OMP by introducing overlapping Group OMP to handle overlapping groups of features. Empirical analysis verifies that both OMP and overlapping GOMP constitute powerful regularizers, able to produce effective and very sparse models.
AB - In text classification, the problem of overfitting arises due to the high dimensionality, making regularization essential. Although classic regularizers provide sparsity, they fail to return highly accurate models. On the contrary, state-of-the-art group-lasso regularizers provide better results at the expense of low sparsity. In this paper, we apply a greedy variable selection algorithm, called Orthogonal Matching Pursuit, for the text classification task. We also extend standard group OMP by introducing overlapping Group OMP to handle overlapping groups of features. Empirical analysis verifies that both OMP and overlapping GOMP constitute powerful regularizers, able to produce effective and very sparse models.
M3 - Conference contribution
AN - SCOPUS:85068313074
T3 - 4th Workshop on Noisy User-Generated Text, W-NUT 2018 - Proceedings of the Workshop
SP - 93
EP - 103
BT - 4th Workshop on Noisy User-Generated Text, W-NUT 2018 - Proceedings of the Workshop
PB - Association for Computational Linguistics (ACL)
T2 - 4th Workshop on Noisy User-Generated Text, W-NUT 2018
Y2 - 1 November 2018
ER -