Skip to main navigation Skip to search Skip to main content

Fusion Methods for Speech Enhancement and Audio Source Separation

Research output: Contribution to journalArticlepeer-review

Abstract

A wide variety of audio source separation techniques exist and can already tackle many challenging industrial issues. However, in contrast with other application domains, fusion principles were rarely investigated in audio source separation despite their demonstrated potential in classification tasks. In this paper, we propose a general fusion framework which takes advantage of the diversity of existing separation techniques in order to improve separation quality. We obtain new source estimates by summing the individual estimates given by different separation techniques weighted by a set of fusion coefficients. We investigate three alternative fusion methods which are based on standard nonlinear optimization, Bayesian model averaging, or deep neural networks. Experiments conducted for both speech enhancement and singing voice extraction demonstrate that all the proposed methods outperform traditional model selection. The use of deep neural networks for the estimation of time-varying coefficients notably leads to large quality improvements, up to 3 dB in terms of signal-to-distortion ratio compared to model selection.

Original languageEnglish
Pages (from-to)1266-1279
Number of pages14
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume24
Issue number7
DOIs
Publication statusPublished - 1 Jul 2016

Keywords

  • Audio source separation
  • aggregation
  • deep learning
  • deep neural networks (DNNs)
  • ensemble
  • fusion
  • model averaging
  • non-negativematrix factorization (NMF)
  • singing voice extraction
  • speech enhancement
  • variational Bayes

Fingerprint

Dive into the research topics of 'Fusion Methods for Speech Enhancement and Audio Source Separation'. Together they form a unique fingerprint.

Cite this