Passer à la navigation principale Passer à la recherche Passer au contenu principal

A Framework to Learn with Interpretation

  • Institut Polytechnique de Paris

Résultats de recherche: Le chapitre dans un livre, un rapport, une anthologie ou une collectionContribution à une conférenceRevue par des pairs

Résumé

To tackle interpretability in deep learning, we present a novel framework to jointly learn a predictive model and its associated interpretation model. The interpreter provides both local and global interpretability about the predictive model in terms of human-understandable high level attribute functions, with minimal loss of accuracy. This is achieved by a dedicated architecture and well chosen regularization penalties. We seek for a small-size dictionary of high level attribute functions that take as inputs the outputs of selected hidden layers and whose outputs feed a linear classifier. We impose strong conciseness on the activation of attributes with an entropy-based criterion while enforcing fidelity to both inputs and outputs of the predictive model. A detailed pipeline to visualize the learnt features is also developed. Moreover, besides generating interpretable models by design, our approach can be specialized to provide post-hoc interpretations for a pre-trained neural network. We validate our approach against several state-of-the-art methods on multiple datasets and show its efficacy on both kinds of tasks.

langue originaleAnglais
titreAdvances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
rédacteurs en chefMarc'Aurelio Ranzato, Alina Beygelzimer, Yann Dauphin, Percy S. Liang, Jenn Wortman Vaughan
EditeurNeural information processing systems foundation
Pages24273-24285
Nombre de pages13
ISBN (Electronique)9781713845393
étatPublié - 1 janv. 2021
Evénement35th Conference on Neural Information Processing Systems, NeurIPS 2021 - Virtual, Online
Durée: 6 déc. 202114 déc. 2021

Série de publications

NomAdvances in Neural Information Processing Systems
Volume29
ISSN (imprimé)1049-5258

Une conférence

Une conférence35th Conference on Neural Information Processing Systems, NeurIPS 2021
La villeVirtual, Online
période6/12/2114/12/21

Empreinte digitale

Examiner les sujets de recherche de « A Framework to Learn with Interpretation ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation