TY - GEN
T1 - Multichannel audio source separation
T2 - 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017
AU - Leglaive, Simon
AU - Badeau, Roland
AU - Richard, Gael
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/6/16
Y1 - 2017/6/16
N2 - A great number of methods for multichannel audio source separation are based on probabilistic approaches in which the sources are modeled as latent random variables in a Time-Frequency (TF) domain. For reverberant mixtures, it is common to approximate the time-domain convolutive mixing process as being instantaneous in the short-term Fourier transform domain, under a short mixing filters assumption. The TF latent sources are then inferred from the TF mixture observations. In this paper we propose to infer the TF latent sources from the time-domain observations. This approach allows us to exactly model the convolutive mixing process. The inference procedure relies on a variational expectation-maximization algorithm. In significant reverberation conditions, our approach leads to a signal-to-distortion ratio improvement of 5.5 dB compared with the usual TF approximation of the convolutive mixing process.
AB - A great number of methods for multichannel audio source separation are based on probabilistic approaches in which the sources are modeled as latent random variables in a Time-Frequency (TF) domain. For reverberant mixtures, it is common to approximate the time-domain convolutive mixing process as being instantaneous in the short-term Fourier transform domain, under a short mixing filters assumption. The TF latent sources are then inferred from the TF mixture observations. In this paper we propose to infer the TF latent sources from the time-domain observations. This approach allows us to exactly model the convolutive mixing process. The inference procedure relies on a variational expectation-maximization algorithm. In significant reverberation conditions, our approach leads to a signal-to-distortion ratio improvement of 5.5 dB compared with the usual TF approximation of the convolutive mixing process.
KW - Multichannel audio source separation
KW - nonnegative matrix factorization
KW - time-domain convolutive model
KW - time-frequency source model
KW - variational EM algorithm
U2 - 10.1109/ICASSP.2017.7951791
DO - 10.1109/ICASSP.2017.7951791
M3 - Conference contribution
AN - SCOPUS:85023763543
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 26
EP - 30
BT - 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 5 March 2017 through 9 March 2017
ER -