A MODEL YOU CAN HEAR: AUDIO IDENTIFICATION WITH PLAYABLE PROTOTYPES

  • Romain Loiseau
  • , Baptiste Bouvier
  • , Yann Teytaut
  • , Elliot Vincent
  • , Mathieu Aubry
  • , Loic Landrieu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Machine learning techniques have proved useful for classifying and analyzing audio content. However, recent methods typically rely on abstract and high-dimensional representations that are difficult to interpret. Inspired by transformation-invariant approaches developed for image and 3D data, we propose an audio identification model based on learnable spectral prototypes. Equipped with dedicated transformation networks, these prototypes can be used to cluster and classify input audio samples from large collections of sounds. Our model can be trained with or without supervision and reaches state-of-the-art results for speaker and instrument identification, while remaining easily interpretable. The code is available at: https://github.com/romainloiseau/a-model-you-can-hear.

Original languageEnglish
Title of host publicationProceedings of the 23rd International Society for Music Information Retrieval Conference, ISMIR 2022
EditorsPreeti Rao, Hema Murthy, Ajay Srinivasamurthy, Rachel Bittner, Rafael Caro Repetto, Masataka Goto, Xavier Serra, Marius Miron
PublisherInternational Society for Music Information Retrieval
Pages694-700
Number of pages7
ISBN (Electronic)9781732729926
Publication statusPublished - 1 Jan 2022
Externally publishedYes
Event23rd International Society for Music Information Retrieval Conference, ISMIR 2022 - Hybrid, Bengaluru, India
Duration: 4 Dec 20228 Dec 2022

Publication series

NameProceedings of the 23rd International Society for Music Information Retrieval Conference, ISMIR 2022

Conference

Conference23rd International Society for Music Information Retrieval Conference, ISMIR 2022
Country/TerritoryIndia
CityHybrid, Bengaluru
Period4/12/228/12/22

Fingerprint

Dive into the research topics of 'A MODEL YOU CAN HEAR: AUDIO IDENTIFICATION WITH PLAYABLE PROTOTYPES'. Together they form a unique fingerprint.

Cite this