Passer à la navigation principale Passer à la recherche Passer au contenu principal

Hierarchical pre-training for sequence labelling in spoken dialog

  • Emile Chapuis
  • , Pierre Colombo
  • , Matteo Manica
  • , Matthieu Labeau
  • , Chloe Clavel
  • Institut Polytechnique de Paris
  • IBM GBS France
  • S.

Résultats de recherche: Le chapitre dans un livre, un rapport, une anthologie ou une collectionContribution à une conférenceRevue par des pairs

Résumé

Sequence labelling tasks like Dialog Act and Emotion/Sentiment identification are a key component of spoken dialog systems. In this work, we propose a new approach to learn generic representations adapted to spoken dialog, which we evaluate on a new benchmark we call Sequence labellIng evaLuatIon benChmark fOr spoken laNguagE benchmark (SILICONE). SILICONE is model-agnostic and contains 10 different datasets of various sizes. We obtain our representations with a hierarchical encoder based on transformer architectures, for which we extend two well-known pretraining objectives. Pre-training is performed on OpenSubtitles: a large corpus of spoken dialog containing over 2.3 billion of tokens. We demonstrate how hierarchical encoders achieve competitive results with consistently fewer parameters compared to state-of-the-art models and we show their importance for both pre-training and fine-tuning.

langue originaleAnglais
titreFindings of the Association for Computational Linguistics Findings of ACL
Sous-titreEMNLP 2020
EditeurAssociation for Computational Linguistics (ACL)
Pages2636-2648
Nombre de pages13
ISBN (Electronique)9781952148903
étatPublié - 1 janv. 2020
EvénementFindings of the Association for Computational Linguistics, ACL 2020: EMNLP 2020 - Virtual, Online
Durée: 16 nov. 202020 nov. 2020

Série de publications

NomFindings of the Association for Computational Linguistics Findings of ACL: EMNLP 2020

Une conférence

Une conférenceFindings of the Association for Computational Linguistics, ACL 2020: EMNLP 2020
La villeVirtual, Online
période16/11/2020/11/20

Empreinte digitale

Examiner les sujets de recherche de « Hierarchical pre-training for sequence labelling in spoken dialog ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation