TY - GEN
T1 - Cauchy multichannel speech enhancement with a deep speech prior
AU - Fontaine, Mathieu
AU - Nugraha, Aditya Arie
AU - Badeau, Roland
AU - Yoshii, Kazuyoshi
AU - Liutkus, Antoine
N1 - Publisher Copyright:
© 2019,IEEE
PY - 2019/9/1
Y1 - 2019/9/1
N2 - We propose a semi-supervised multichannel speech enhancement system based on a probabilistic model which assumes that both speech and noise follow the heavy-tailed multivariate complex Cauchy distribution. As we advocate, this allows handling strong and adverse noisy conditions. Consequently, the model is parameterized by the source magnitude spectrograms and the source spatial scatter matrices. To deal with the nonadditivity of scatter matrices, our first contribution is to perform the enhancement on a projected space. Then, our second contribution is to combine a latent variable model for speech, which is trained by following the variational autoencoder framework, with a low-rank model for the noise source. At test time, an iterative inference algorithm is applied, which produces estimated parameters to use for separation. The speech latent variables are estimated first from the noisy speech and then updated by a gradient descent method, while a majorization-equalization strategy is used to update both the noise and the spatial parameters of both sources. Our experimental results show that the Cauchy model outperforms the state-of-art methods. The standard deviation scores also reveal that the proposed method is more robust against non-stationary noise.
AB - We propose a semi-supervised multichannel speech enhancement system based on a probabilistic model which assumes that both speech and noise follow the heavy-tailed multivariate complex Cauchy distribution. As we advocate, this allows handling strong and adverse noisy conditions. Consequently, the model is parameterized by the source magnitude spectrograms and the source spatial scatter matrices. To deal with the nonadditivity of scatter matrices, our first contribution is to perform the enhancement on a projected space. Then, our second contribution is to combine a latent variable model for speech, which is trained by following the variational autoencoder framework, with a low-rank model for the noise source. At test time, an iterative inference algorithm is applied, which produces estimated parameters to use for separation. The speech latent variables are estimated first from the noisy speech and then updated by a gradient descent method, while a majorization-equalization strategy is used to update both the noise and the spatial parameters of both sources. Our experimental results show that the Cauchy model outperforms the state-of-art methods. The standard deviation scores also reveal that the proposed method is more robust against non-stationary noise.
KW - Multichannel speech enhancement
KW - Multivariate complex Cauchy distribution
KW - Nonnegative matrix factorization
KW - Variational autoencoder
U2 - 10.23919/EUSIPCO.2019.8903091
DO - 10.23919/EUSIPCO.2019.8903091
M3 - Conference contribution
AN - SCOPUS:85075618840
T3 - European Signal Processing Conference
BT - EUSIPCO 2019 - 27th European Signal Processing Conference
PB - European Signal Processing Conference, EUSIPCO
T2 - 27th European Signal Processing Conference, EUSIPCO 2019
Y2 - 2 September 2019 through 6 September 2019
ER -