TY - GEN
T1 - Unsupervised Blind Source Separation with Variational Auto-Encoders
AU - Neri, Julian
AU - Badeau, Roland
AU - Depalle, Philippe
N1 - Publisher Copyright:
© 2021 European Signal Processing Conference. All rights reserved.
PY - 2021/1/1
Y1 - 2021/1/1
N2 - Supervised source separation requires expensive synthetic datasets containing clean, ground truth-source signals, while unsupervised separation requires only data mixtures. Existing unsupervised methods still use supervision to avoid over-separation and compete with fully supervised methods. We present a new method of completely unsupervised single-channel blind source separation, based on variational auto-encoding, that automatically learns the correct number of sources in data mixtures and quantitatively outperforms the existing methods. A deep inference network disentangles (separates) data mixtures into low-dimensional latent source variables. A deep generative network individually decodes each latent source into its source signal, such that their sum represents the given mixture. Qualitative and quantitative results from separation experiments on pairs of randomly mixed MNIST handwritten digits and mixed audio spectrograms demonstrate that our method outperforms state-of-the-art unsupervised and semi-supervised methods, showing promise as a solution to this long-standing problem in computer vision and audition.
AB - Supervised source separation requires expensive synthetic datasets containing clean, ground truth-source signals, while unsupervised separation requires only data mixtures. Existing unsupervised methods still use supervision to avoid over-separation and compete with fully supervised methods. We present a new method of completely unsupervised single-channel blind source separation, based on variational auto-encoding, that automatically learns the correct number of sources in data mixtures and quantitatively outperforms the existing methods. A deep inference network disentangles (separates) data mixtures into low-dimensional latent source variables. A deep generative network individually decodes each latent source into its source signal, such that their sum represents the given mixture. Qualitative and quantitative results from separation experiments on pairs of randomly mixed MNIST handwritten digits and mixed audio spectrograms demonstrate that our method outperforms state-of-the-art unsupervised and semi-supervised methods, showing promise as a solution to this long-standing problem in computer vision and audition.
KW - Bayesian inference
KW - Blind source separation
KW - Latent variable model
KW - Universal sound separation
KW - Unmixing
UR - https://www.scopus.com/pages/publications/85118303284
U2 - 10.23919/EUSIPCO54536.2021.9616154
DO - 10.23919/EUSIPCO54536.2021.9616154
M3 - Conference contribution
AN - SCOPUS:85118303284
T3 - European Signal Processing Conference
SP - 311
EP - 315
BT - 29th European Signal Processing Conference, EUSIPCO 2021 - Proceedings
PB - European Signal Processing Conference, EUSIPCO
T2 - 29th European Signal Processing Conference, EUSIPCO 2021
Y2 - 23 August 2021 through 27 August 2021
ER -