TY - GEN
T1 - Learning with noise-contrastive estimation
T2 - 27th International Conference on Computational Linguistics, COLING 2018
AU - Labeau, Matthieu
AU - Allauzen, Alexandre
N1 - Publisher Copyright:
© 2018 COLING 2018 - 27th International Conference on Computational Linguistics, Proceedings. All rights reserved.
PY - 2018/1/1
Y1 - 2018/1/1
N2 - Noise-Contrastive Estimation (NCE) is a learning criterion that is regularly used to train neural language models in place of Maximum Likelihood Estimation, since it avoids the computational bottleneck caused by the output softmax. In this paper, we analyse and explain some of the weaknesses of this objective function, linked to the mechanism of self-normalization, by closely monitoring comparative experiments. We then explore several remedies and modifications to propose tractable and efficient NCE training strategies. In particular, we propose to make the scaling factor a trainable parameter of the model, and to use the noise distribution to initialize the output bias. These solutions, yet simple, yield stable and competitive performances in either small and large scale language modelling tasks.
AB - Noise-Contrastive Estimation (NCE) is a learning criterion that is regularly used to train neural language models in place of Maximum Likelihood Estimation, since it avoids the computational bottleneck caused by the output softmax. In this paper, we analyse and explain some of the weaknesses of this objective function, linked to the mechanism of self-normalization, by closely monitoring comparative experiments. We then explore several remedies and modifications to propose tractable and efficient NCE training strategies. In particular, we propose to make the scaling factor a trainable parameter of the model, and to use the noise distribution to initialize the output bias. These solutions, yet simple, yield stable and competitive performances in either small and large scale language modelling tasks.
UR - https://www.scopus.com/pages/publications/85100467620
M3 - Conference contribution
AN - SCOPUS:85100467620
T3 - COLING 2018 - 27th International Conference on Computational Linguistics, Proceedings
SP - 3090
EP - 3101
BT - COLING 2018 - 27th International Conference on Computational Linguistics, Proceedings
A2 - Bender, Emily M.
A2 - Derczynski, Leon
A2 - Isabelle, Pierre
PB - Association for Computational Linguistics (ACL)
Y2 - 20 August 2018 through 26 August 2018
ER -