Passer à la navigation principale Passer à la recherche Passer au contenu principal

Zero-shot Musical Stem Retrieval with Joint-Embedding Predictive Architectures

  • Alain Riou
  • , Antonin Gagneré
  • , Gaëtan Hadjeres
  • , Stefan Lattner
  • , Geoffroy Peeters
  • Institut Polytechnique de Paris
  • Sony Computer Science Laboratory
  • Sony Corporation

Résultats de recherche: Le chapitre dans un livre, un rapport, une anthologie ou une collectionContribution à une conférenceRevue par des pairs

Résumé

In this paper, we tackle the task of musical stem retrieval. Given a musical mix, it consists in retrieving a stem that would fit with it, i.e., that would sound pleasant if played together. To do so, we introduce a new method based on Joint-Embedding Predictive Architectures, where an encoder and a predictor are jointly trained to produce latent representations of a context and predict latent representations of a target. In particular, we design our predictor to be conditioned on arbitrary instruments, enabling our model to perform zero-shot stem retrieval. In addition, we discover that pretraining the encoder using contrastive learning drastically improves the model's performance. We validate the retrieval performances of our model using the MUSDB18 and MoisesDB datasets. We show that it significantly outperforms previous baselines on both datasets, showcasing its ability to support more or less precise (and possibly unseen) conditioning. We also evaluate the learned embeddings on a beat tracking task, demonstrating that they retain temporal structure and local information.

langue originaleAnglais
titre2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Proceedings
rédacteurs en chefBhaskar D Rao, Isabel Trancoso, Gaurav Sharma, Neelesh B. Mehta
EditeurInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronique)9798350368741
Les DOIs
étatPublié - 1 janv. 2025
Evénement2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Hyderabad, Inde
Durée: 6 avr. 202511 avr. 2025

Série de publications

NomICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (imprimé)1520-6149

Une conférence

Une conférence2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025
Pays/TerritoireInde
La villeHyderabad
période6/04/2511/04/25

Empreinte digitale

Examiner les sujets de recherche de « Zero-shot Musical Stem Retrieval with Joint-Embedding Predictive Architectures ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation