Skip to main navigation Skip to search Skip to main content

ADAPTING PITCH-BASED SELF SUPERVISED LEARNING MODELS FOR TEMPO ESTIMATION

  • Institut Polytechnique de Paris

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Tempo estimation is the task of estimating the periodicity of the dominant rhythm pulse of a music audio signal. It has therefore a close relationship with dominant pitch estimation. Recently, both tasks have been addressed in a Self-Supervised Learning (SSL) fashion so as to leverage unlabelled data for training. In this work, we study the applicability of two successful pitch-based SSL models, SPICE and PESTO, for the purpose of tempo estimation. Both successfully exploit Siamese networks with a pitch-shifting view generation between the two branches. To apply these models for tempo estimation, we represent the audio signal by the Constant-Q transform (CQT) of its onset-strength-function and adapt their view generation using time-stretching (instead of pitch shifting), which is efficiently implemented by shifting the CQT. In a large experiment, we show that simply adapting PESTO in this way yields superior results than the previous SSL approach to tempo estimation for most datasets used in the reference benchmark. Further, since PESTO is light-weight, requiring only a few training data, we study a new learning scheme where the downstream datasets are processed directly in a SSL fashion (without access to labels) showing that this is an interesting alternative further improving the performance for some datasets.

Original languageEnglish
Title of host publication2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages956-960
Number of pages5
ISBN (Electronic)9798350344851
DOIs
Publication statusPublished - 1 Jan 2024
Event2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Seoul, Korea, Republic of
Duration: 14 Apr 202419 Apr 2024

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024
Country/TerritoryKorea, Republic of
CitySeoul
Period14/04/2419/04/24

Keywords

  • self-supervised-learning
  • tempo estimation

Fingerprint

Dive into the research topics of 'ADAPTING PITCH-BASED SELF SUPERVISED LEARNING MODELS FOR TEMPO ESTIMATION'. Together they form a unique fingerprint.

Cite this