Passer à la navigation principale Passer à la recherche Passer au contenu principal

Speech self-supervised representations benchmarking: A case for larger probing heads

  • Salah Zaiem
  • , Youcef Kemiche
  • , Titouan Parcollet
  • , Slim Essid
  • , Mirco Ravanelli
  • CNRS LTCI
  • Capgemini Engineering
  • Samsung AI Center - Cambridge
  • Concordia University

Résultats de recherche: Contribution à un journalArticleRevue par des pairs

Résumé

Self-supervised learning (SSL) leverages large datasets of unlabeled speech to reach impressive performance with reduced amounts of annotated data. The high number of proposed approaches fostered the emergence of comprehensive benchmarks that evaluate their performance on a set of downstream tasks exploring various aspects of the speech signal. However, while the number of considered tasks has been growing, most proposals rely upon a single downstream architecture that maps the frozen SSL representations to the task labels. This study examines how benchmarking results are affected by changes in the probing head architecture. Interestingly, we found that altering the downstream architecture structure leads to significant fluctuations in the performance ranking of the evaluated models. Against common practices in speech SSL benchmarking, we evaluate larger-capacity probing heads, showing their impact on performance, inference costs, generalization, and multi-level feature exploitation.

langue originaleAnglais
Numéro d'article101695
journalComputer Speech and Language
Volume89
Les DOIs
étatPublié - 1 janv. 2025
Modification externeOui

Empreinte digitale

Examiner les sujets de recherche de « Speech self-supervised representations benchmarking: A case for larger probing heads ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation