TY - JOUR
T1 - Tackling Interpretability in Audio Classification Networks With Non-negative Matrix Factorization
AU - Parekh, Jayneel
AU - Parekh, Sanjeel
AU - Mozharovskyi, Pavlo
AU - Richard, Gael
AU - D'alche-Buc, Florence
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2024/1/1
Y1 - 2024/1/1
N2 - This article tackles two major problem settings for interpretability of audio processing networks, post-hoc and by-design interpretation. For post-hoc interpretation, we aim to interpret decisions of a network in terms of high-level audio objects that are also listenable for the end-user. This is extended to present an inherently interpretable model with high performance. To this end, we propose a novel interpreter design that incorporates non-negative matrix factorization (NMF). In particular, an interpreter is trained to generate a regularized intermediate embedding from hidden layers of a target network, learnt as time-activations of a pre-learnt NMF dictionary. Our methodology allows us to generate intuitive audio-based interpretations that explicitly enhance parts of the input signal most relevant for a network's decision. We demonstrate our method's applicability on a variety of classification tasks, including multi-label data for real-world audio and music.
AB - This article tackles two major problem settings for interpretability of audio processing networks, post-hoc and by-design interpretation. For post-hoc interpretation, we aim to interpret decisions of a network in terms of high-level audio objects that are also listenable for the end-user. This is extended to present an inherently interpretable model with high performance. To this end, we propose a novel interpreter design that incorporates non-negative matrix factorization (NMF). In particular, an interpreter is trained to generate a regularized intermediate embedding from hidden layers of a target network, learnt as time-activations of a pre-learnt NMF dictionary. Our methodology allows us to generate intuitive audio-based interpretations that explicitly enhance parts of the input signal most relevant for a network's decision. We demonstrate our method's applicability on a variety of classification tasks, including multi-label data for real-world audio and music.
KW - Audio interpretability
KW - audio convolutional networks
KW - by-design interpretable models
KW - explainability
KW - non-negative matrix factorization
U2 - 10.1109/TASLP.2024.3358049
DO - 10.1109/TASLP.2024.3358049
M3 - Article
AN - SCOPUS:85183982015
SN - 2329-9290
VL - 32
SP - 1392
EP - 1405
JO - IEEE/ACM Transactions on Audio Speech and Language Processing
JF - IEEE/ACM Transactions on Audio Speech and Language Processing
ER -