TY - JOUR
T1 - Further results on latent discourse models and word embeddings
AU - Khalife, Sammy
AU - Gonçalves, Douglas
AU - Allouah, Youssef
AU - Liberti, Leo
N1 - Publisher Copyright:
©2021 Sammy Khalife, Douglas Gonçalves, Youssef Allouah, Leo Liberti.
PY - 2021/1/1
Y1 - 2021/1/1
N2 - We discuss some properties of generative models for word embeddings. Namely, (Arora et al., 2016) proposed a latent discourse model implying the concentration of the partition function of the word vectors. This concentration phenomenon led to an asymptotic linear relation between the pointwise mutual information (PMI) of pairs of words and the scalar product of their vectors. Here, we first revisit this concentration phenomenon and prove it under slightly weaker assumptions, for a set of random vectors symmetrically distributed around the origin. Second, we empirically evaluate the relation between PMI and scalar products of word vectors satisfying the concentration property. Our empirical results indicate that, in practice, this relation does not hold with arbitrarily small error. This observation is further supported by two theoretical results: (i) the error cannot be exactly zero because the corresponding shifted PMI matrix cannot be positive semidefinite; (ii) under mild assumptions, there exist pairs of words for which the error cannot be close to zero. We deduce that either natural language does not follow the assumptions of the considered generative model, or the current word vector generation methods do not allow the construction of the hypothesized word embeddings.
AB - We discuss some properties of generative models for word embeddings. Namely, (Arora et al., 2016) proposed a latent discourse model implying the concentration of the partition function of the word vectors. This concentration phenomenon led to an asymptotic linear relation between the pointwise mutual information (PMI) of pairs of words and the scalar product of their vectors. Here, we first revisit this concentration phenomenon and prove it under slightly weaker assumptions, for a set of random vectors symmetrically distributed around the origin. Second, we empirically evaluate the relation between PMI and scalar products of word vectors satisfying the concentration property. Our empirical results indicate that, in practice, this relation does not hold with arbitrarily small error. This observation is further supported by two theoretical results: (i) the error cannot be exactly zero because the corresponding shifted PMI matrix cannot be positive semidefinite; (ii) under mild assumptions, there exist pairs of words for which the error cannot be close to zero. We deduce that either natural language does not follow the assumptions of the considered generative model, or the current word vector generation methods do not allow the construction of the hypothesized word embeddings.
KW - Asymptotic concentration
KW - Generative models
KW - Latent variable models
KW - Matrix factorization
KW - Natural language processing
M3 - Article
AN - SCOPUS:85121108715
SN - 1532-4435
VL - 22
JO - Journal of Machine Learning Research
JF - Journal of Machine Learning Research
ER -