MULTILINGUAL LYRICS-TO-AUDIO ALIGNMENT

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Lyrics-to-audio alignment methods have recently reported impressive results, opening the door to practical applications such as karaoke and within song navigation. However, most studies focus on a single language - usually English - for which annotated data are abundant. The question of their ability to generalize to other languages, especially in low (or even zero) training resource scenarios has been so far left unexplored. In this paper, we address the lyrics-to-audio alignment task in a generalized multilingual setup. More precisely, this investigation presents the first (to the best of our knowledge) attempt to create a language-independent lyrics-to-audio alignment system. Building on a Recurrent Neural Network (RNN) model trained with a Connectionist Temporal Classification (CTC) algorithm, we study the relevance of different intermediate representations, either character or phoneme, along with several strategies to design a training set. The evaluation is conducted on multiple languages with a varying amount of data available, from plenty to zero. Results show that learning from diverse data and using a universal phoneme set as an intermediate representation yield the best generalization performances.

Original languageEnglish
Title of host publicationProceedings of the 21st International Society for Music Information Retrieval Conference, ISMIR 2020
EditorsJulie Cumming, Jin Ha Lee, Brian McFee, Markus Schedl, Johanna Devaney, Johanna Devaney, Cory McKay, Eva Zangerle, Timothy de Reuse
PublisherInternational Society for Music Information Retrieval
Pages512-519
Number of pages8
ISBN (Electronic)9780981353708
Publication statusPublished - 1 Jan 2020
Event21st International Society for Music Information Retrieval Conference, ISMIR 2020 - Virtual, Online, Canada
Duration: 11 Oct 202016 Oct 2020

Publication series

NameProceedings of the 21st International Society for Music Information Retrieval Conference, ISMIR 2020

Conference

Conference21st International Society for Music Information Retrieval Conference, ISMIR 2020
Country/TerritoryCanada
CityVirtual, Online
Period11/10/2016/10/20

Fingerprint

Dive into the research topics of 'MULTILINGUAL LYRICS-TO-AUDIO ALIGNMENT'. Together they form a unique fingerprint.

Cite this