TY - GEN
T1 - Geometry and Analogies
T2 - 7th International Conference on Statistical Language and Speech Processing, SLSP 2019
AU - Khalife, Sammy
AU - Liberti, Leo
AU - Vazirgiannis, Michalis
N1 - Publisher Copyright:
© 2019, Springer Nature Switzerland AG.
PY - 2019/1/1
Y1 - 2019/1/1
N2 - In this paper we discuss the well-known claim that language analogies yield almost parallel vector differences in word embeddings. On the one hand, we show that this property, while it does hold for a handful of cases, fails to hold in general especially in high dimension, using the best known publicly available word embeddings. On the other hand, we show that this property is not crucial for basic natural language processing tasks such as text classification. We achieve this by a simple algorithm which yields updated word embeddings where this property holds: we show that in these word representations, text classification tasks have about the same performance.
AB - In this paper we discuss the well-known claim that language analogies yield almost parallel vector differences in word embeddings. On the one hand, we show that this property, while it does hold for a handful of cases, fails to hold in general especially in high dimension, using the best known publicly available word embeddings. On the other hand, we show that this property is not crucial for basic natural language processing tasks such as text classification. We achieve this by a simple algorithm which yields updated word embeddings where this property holds: we show that in these word representations, text classification tasks have about the same performance.
U2 - 10.1007/978-3-030-31372-2_9
DO - 10.1007/978-3-030-31372-2_9
M3 - Conference contribution
AN - SCOPUS:85075896418
SN - 9783030313715
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 100
EP - 111
BT - Statistical Language and Speech Processing - 7th International Conference, SLSP 2019, Proceedings
A2 - Martín-Vide, Carlos
A2 - Purver, Matthew
A2 - Pollak, Senja
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 14 October 2019 through 16 October 2019
ER -