TY - GEN
T1 - Leveraging deep neural networks with nonnegative representations for improved environmental sound classification
AU - Bisot, Victor
AU - Serizel, Romain
AU - Essid, Slim
AU - Richard, Gael
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/12/5
Y1 - 2017/12/5
N2 - This paper introduces the use of representations based on nonnegative matrix factorization (NMF) to train deep neural networks with applications to environmental sound classification. Deep learning systems for sound classification usually rely on the network to learn meaningful representations from spectrograms or hand-crafted features. Instead, we introduce a NMF-based feature learning stage before training deep networks, whose usefulness is highlighted in this paper, especially for multi-source acoustic environments such as sound scenes. We rely on two established unsupervised and supervised NMF techniques to learn better input representations for deep neural networks. This will allow us, with simple architectures, to reach competitive performance with more complex systems such as convolutional networks for acoustic scene classification. The proposed systems outperform neural networks trained on time-frequency representations on two acoustic scene classification datasets as well as the best systems from the 2016 DCASE challenge.
AB - This paper introduces the use of representations based on nonnegative matrix factorization (NMF) to train deep neural networks with applications to environmental sound classification. Deep learning systems for sound classification usually rely on the network to learn meaningful representations from spectrograms or hand-crafted features. Instead, we introduce a NMF-based feature learning stage before training deep networks, whose usefulness is highlighted in this paper, especially for multi-source acoustic environments such as sound scenes. We rely on two established unsupervised and supervised NMF techniques to learn better input representations for deep neural networks. This will allow us, with simple architectures, to reach competitive performance with more complex systems such as convolutional networks for acoustic scene classification. The proposed systems outperform neural networks trained on time-frequency representations on two acoustic scene classification datasets as well as the best systems from the 2016 DCASE challenge.
KW - Deep Neural Networks
KW - Nonnegative Matrix Factorization
KW - Sound Classification
U2 - 10.1109/MLSP.2017.8168139
DO - 10.1109/MLSP.2017.8168139
M3 - Conference contribution
AN - SCOPUS:85042284135
T3 - IEEE International Workshop on Machine Learning for Signal Processing, MLSP
SP - 1
EP - 6
BT - 2017 IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2017 - Proceedings
A2 - Ueda, Naonori
A2 - Chien, Jen-Tzung
A2 - Matsui, Tomoko
A2 - Larsen, Jan
A2 - Watanabe, Shinji
PB - IEEE Computer Society
T2 - 2017 IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2017
Y2 - 25 September 2017 through 28 September 2017
ER -