Learning word embeddings: Unsupervised methods for fixed-size representations of variable-length speech segments

Nils Holzenberger, Mingxing Du, Julien Karadayi, Rachid Riad, Emmanuel Dupoux

Research output: Contribution to journalConference articlepeer-review

Abstract

Fixed-length embeddings of words are very useful for a variety of tasks in speech and language processing. Here we systematically explore two methods of computing fixed-length embeddings for variable-length sequences. We evaluate their susceptibility to phonetic and speaker-specific variability on English, a high resource language, and Xitsonga, a low resource language, using two evaluation metrics: ABX word discrimination and ROC-AUC on same-different phoneme n-grams. We show that a simple downsampling method supplemented with length information can be competitive with the variable-length input feature representation on both evaluations. Recurrent autoencoders trained without supervision can yield even better results at the expense of increased computational complexity.

Original languageEnglish
Pages (from-to)2683-2687
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2018-September
DOIs
Publication statusPublished - 1 Jan 2018
Externally publishedYes
Event19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 - Hyderabad, India
Duration: 2 Sept 20186 Sept 2018

Keywords

  • ABX discrimination
  • Audio word embeddings
  • Representation learning
  • Same-different classification
  • Unsupervised speech processing

Fingerprint

Dive into the research topics of 'Learning word embeddings: Unsupervised methods for fixed-size representations of variable-length speech segments'. Together they form a unique fingerprint.

Cite this