Passer à la navigation principale Passer à la recherche Passer au contenu principal

Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning

  • Institut Polytechnique de Paris

Résultats de recherche: Le chapitre dans un livre, un rapport, une anthologie ou une collectionContribution à une conférenceRevue par des pairs

Résumé

Recently, self-supervised learning methods based on masked latent prediction have proven to encode input data into powerful representations. However, during training, the learned latent space can be further transformed to extract higher-level information that could be more suited for downstream classification tasks. Therefore, we propose a new method: MAsked latenT Prediction And Classification (MATPAC), which is trained with two pretext tasks solved jointly. As in previous work, the first pretext task is a masked latent prediction task, ensuring a robust input representation in the latent space. The second one is unsupervised classification, which utilises the latent representations of the first pretext task to match probability distributions between a teacher and a student. We validate the MATPAC method by comparing it to other state-of-the-art proposals and conducting ablations studies. MATPAC reaches state-of-the-art self-supervised learning results on reference audio classification datasets such as OpenMIC, GTZAN, ESC-50 and US8K and outperforms comparable supervised methods' results for musical auto-tagging on Magna-tag-a-tune.

langue originaleAnglais
titre2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Proceedings
rédacteurs en chefBhaskar D Rao, Isabel Trancoso, Gaurav Sharma, Neelesh B. Mehta
EditeurInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronique)9798350368741
Les DOIs
étatPublié - 1 janv. 2025
Evénement2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Hyderabad, Inde
Durée: 6 avr. 202511 avr. 2025

Série de publications

NomICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (imprimé)1520-6149

Une conférence

Une conférence2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025
Pays/TerritoireInde
La villeHyderabad
période6/04/2511/04/25

Empreinte digitale

Examiner les sujets de recherche de « Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation