TY - GEN
T1 - A MODEL YOU CAN HEAR
T2 - 23rd International Society for Music Information Retrieval Conference, ISMIR 2022
AU - Loiseau, Romain
AU - Bouvier, Baptiste
AU - Teytaut, Yann
AU - Vincent, Elliot
AU - Aubry, Mathieu
AU - Landrieu, Loic
N1 - Publisher Copyright:
© R. Loiseau, B. Bouvier, Y. Teytaut, E. Vincent, M. Aubry, and L. Landrieu.
PY - 2022/1/1
Y1 - 2022/1/1
N2 - Machine learning techniques have proved useful for classifying and analyzing audio content. However, recent methods typically rely on abstract and high-dimensional representations that are difficult to interpret. Inspired by transformation-invariant approaches developed for image and 3D data, we propose an audio identification model based on learnable spectral prototypes. Equipped with dedicated transformation networks, these prototypes can be used to cluster and classify input audio samples from large collections of sounds. Our model can be trained with or without supervision and reaches state-of-the-art results for speaker and instrument identification, while remaining easily interpretable. The code is available at: https://github.com/romainloiseau/a-model-you-can-hear.
AB - Machine learning techniques have proved useful for classifying and analyzing audio content. However, recent methods typically rely on abstract and high-dimensional representations that are difficult to interpret. Inspired by transformation-invariant approaches developed for image and 3D data, we propose an audio identification model based on learnable spectral prototypes. Equipped with dedicated transformation networks, these prototypes can be used to cluster and classify input audio samples from large collections of sounds. Our model can be trained with or without supervision and reaches state-of-the-art results for speaker and instrument identification, while remaining easily interpretable. The code is available at: https://github.com/romainloiseau/a-model-you-can-hear.
UR - https://www.scopus.com/pages/publications/85202043786
M3 - Conference contribution
AN - SCOPUS:85202043786
T3 - Proceedings of the 23rd International Society for Music Information Retrieval Conference, ISMIR 2022
SP - 694
EP - 700
BT - Proceedings of the 23rd International Society for Music Information Retrieval Conference, ISMIR 2022
A2 - Rao, Preeti
A2 - Murthy, Hema
A2 - Srinivasamurthy, Ajay
A2 - Bittner, Rachel
A2 - Repetto, Rafael Caro
A2 - Goto, Masataka
A2 - Serra, Xavier
A2 - Miron, Marius
PB - International Society for Music Information Retrieval
Y2 - 4 December 2022 through 8 December 2022
ER -