Tackling Interpretability in Audio Classification Networks With Non-negative Matrix Factorization

Research output: Contribution to journalArticlepeer-review

Abstract

This article tackles two major problem settings for interpretability of audio processing networks, post-hoc and by-design interpretation. For post-hoc interpretation, we aim to interpret decisions of a network in terms of high-level audio objects that are also listenable for the end-user. This is extended to present an inherently interpretable model with high performance. To this end, we propose a novel interpreter design that incorporates non-negative matrix factorization (NMF). In particular, an interpreter is trained to generate a regularized intermediate embedding from hidden layers of a target network, learnt as time-activations of a pre-learnt NMF dictionary. Our methodology allows us to generate intuitive audio-based interpretations that explicitly enhance parts of the input signal most relevant for a network's decision. We demonstrate our method's applicability on a variety of classification tasks, including multi-label data for real-world audio and music.

Original languageEnglish
Pages (from-to)1392-1405
Number of pages14
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume32
DOIs
Publication statusPublished - 1 Jan 2024

Keywords

  • Audio interpretability
  • audio convolutional networks
  • by-design interpretable models
  • explainability
  • non-negative matrix factorization

Fingerprint

Dive into the research topics of 'Tackling Interpretability in Audio Classification Networks With Non-negative Matrix Factorization'. Together they form a unique fingerprint.

Cite this