Abstract
A wide variety of audio source separation techniques exist and can already tackle many challenging industrial issues. However, in contrast with other application domains, fusion principles were rarely investigated in audio source separation despite their demonstrated potential in classification tasks. In this paper, we propose a general fusion framework which takes advantage of the diversity of existing separation techniques in order to improve separation quality. We obtain new source estimates by summing the individual estimates given by different separation techniques weighted by a set of fusion coefficients. We investigate three alternative fusion methods which are based on standard nonlinear optimization, Bayesian model averaging, or deep neural networks. Experiments conducted for both speech enhancement and singing voice extraction demonstrate that all the proposed methods outperform traditional model selection. The use of deep neural networks for the estimation of time-varying coefficients notably leads to large quality improvements, up to 3 dB in terms of signal-to-distortion ratio compared to model selection.
| Original language | English |
|---|---|
| Pages (from-to) | 1266-1279 |
| Number of pages | 14 |
| Journal | IEEE/ACM Transactions on Audio Speech and Language Processing |
| Volume | 24 |
| Issue number | 7 |
| DOIs | |
| Publication status | Published - 1 Jul 2016 |
Keywords
- Audio source separation
- aggregation
- deep learning
- deep neural networks (DNNs)
- ensemble
- fusion
- model averaging
- non-negativematrix factorization (NMF)
- singing voice extraction
- speech enhancement
- variational Bayes
Fingerprint
Dive into the research topics of 'Fusion Methods for Speech Enhancement and Audio Source Separation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver