TY - GEN
T1 - Boosting Tricks for Word Mover’s Distance
AU - Skianis, Konstantinos
AU - Malliaros, Fragkiskos D.
AU - Tziortziotis, Nikolaos
AU - Vazirgiannis, Michalis
N1 - Publisher Copyright:
© 2020, Springer Nature Switzerland AG.
PY - 2020/1/1
Y1 - 2020/1/1
N2 - Word embeddings have opened a new path in creating novel approaches for addressing traditional problems in the natural language processing (NLP) domain. However, using word embeddings to compare text documents remains a relatively unexplored topic—with Word Mover’s Distance (WMD) being the prominent tool used so far. In this paper, we present a variety of tools that can further improve the computation of distances between documents based on WMD. We demonstrate that, alternative stopwords, cross document-topic comparison, deep contextualized word vectors and convex metric learning, constitute powerful tools that can boost WMD.
AB - Word embeddings have opened a new path in creating novel approaches for addressing traditional problems in the natural language processing (NLP) domain. However, using word embeddings to compare text documents remains a relatively unexplored topic—with Word Mover’s Distance (WMD) being the prominent tool used so far. In this paper, we present a variety of tools that can further improve the computation of distances between documents based on WMD. We demonstrate that, alternative stopwords, cross document-topic comparison, deep contextualized word vectors and convex metric learning, constitute powerful tools that can boost WMD.
KW - Text classification
KW - Word embeddings
KW - Word mover’s distance
U2 - 10.1007/978-3-030-61616-8_61
DO - 10.1007/978-3-030-61616-8_61
M3 - Conference contribution
AN - SCOPUS:85094160315
SN - 9783030616151
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 761
EP - 772
BT - Artificial Neural Networks and Machine Learning – ICANN 2020 - 29th International Conference on Artificial Neural Networks, Proceedings
A2 - Farkaš, Igor
A2 - Masulli, Paolo
A2 - Wermter, Stefan
PB - Springer Science and Business Media Deutschland GmbH
T2 - 29th International Conference on Artificial Neural Networks, ICANN 2020
Y2 - 15 September 2020 through 18 September 2020
ER -