Passer à la navigation principale Passer à la recherche Passer au contenu principal

CONTENT BASED SINGING VOICE SOURCE SEPARATION VIA STRONG CONDITIONING USING ALIGNED PHONEMES

Résultats de recherche: Le chapitre dans un livre, un rapport, une anthologie ou une collectionContribution à une conférenceRevue par des pairs

Résumé

Informed source separation has recently gained renewed interest with the introduction of neural networks and the availability of large multitrack datasets containing both the mixture and the separated sources. These approaches use prior information about the target source to improve separation. Historically, Music Information Retrieval researchers have focused primarily on score-informed source separation, but more recent approaches explore lyrics-informed source separation. However, because of the lack of multitrack datasets with time-aligned lyrics, models use weak conditioning with non-aligned lyrics. In this paper, we present a multimodal multitrack dataset with lyrics aligned in time at the word level with phonetic information as well as explore strong conditioning using the aligned phonemes. Our model follows a U-Net architecture and takes as input both the magnitude spectrogram of a musical mixture and a matrix with aligned phonetic information. The phoneme matrix is embedded to obtain the parameters that control Feature-wise Linear Modulation (FiLM) layers. These layers condition the U-Net feature maps to adapt the separation process to the presence of different phonemes via affine transformations. We show that phoneme conditioning can be successfully applied to improve singing voice source separation.

langue originaleAnglais
titreProceedings of the 21st International Society for Music Information Retrieval Conference, ISMIR 2020
rédacteurs en chefJulie Cumming, Jin Ha Lee, Brian McFee, Markus Schedl, Johanna Devaney, Johanna Devaney, Cory McKay, Eva Zangerle, Timothy de Reuse
EditeurInternational Society for Music Information Retrieval
Pages109-116
Nombre de pages8
ISBN (Electronique)9780981353708
étatPublié - 1 janv. 2020
Evénement21st International Society for Music Information Retrieval Conference, ISMIR 2020 - Virtual, Online, Canada
Durée: 11 oct. 202016 oct. 2020

Série de publications

NomProceedings of the 21st International Society for Music Information Retrieval Conference, ISMIR 2020

Une conférence

Une conférence21st International Society for Music Information Retrieval Conference, ISMIR 2020
Pays/TerritoireCanada
La villeVirtual, Online
période11/10/2016/10/20

Empreinte digitale

Examiner les sujets de recherche de « CONTENT BASED SINGING VOICE SOURCE SEPARATION VIA STRONG CONDITIONING USING ALIGNED PHONEMES ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation