A FULLY DIFFERENTIABLE MODEL FOR UNSUPERVISED SINGING VOICE SEPARATION

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

A novel model was recently proposed by Schulze-Forster et al. in [1] for unsupervised music source separation. This model allows to tackle some of the major shortcomings of existing source separation frameworks. Specifically, it eliminates the need for isolated sources during training, performs efficiently with limited data, and can handle homogeneous sources (such as singing voice). But, this model relies on an external multipitch estimator and incorporates an Ad hoc voice assignment procedure. In this paper, we propose to extend this framework and to build a fully differentiable model by integrating a multipitch estimator and a novel differentiable assignment module within the core model. We show the merits of our approach through a set of experiments, and we highlight in particular its potential for processing diverse and unseen data.

Original languageEnglish
Title of host publication2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages946-950
Number of pages5
ISBN (Electronic)9798350344851
DOIs
Publication statusPublished - 1 Jan 2024
Event2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Seoul, Korea, Republic of
Duration: 14 Apr 202419 Apr 2024

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024
Country/TerritoryKorea, Republic of
CitySeoul
Period14/04/2419/04/24

Keywords

  • Unsupervised source separation
  • deep learning
  • differentiable models
  • multiple singing voices

Fingerprint

Dive into the research topics of 'A FULLY DIFFERENTIABLE MODEL FOR UNSUPERVISED SINGING VOICE SEPARATION'. Together they form a unique fingerprint.

Cite this