Learning Multi-Pitch Estimation from Weakly Aligned Score-Audio Pairs Using a Multi-Label CTC Loss

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Detecting the simultaneous activity of pitches in music audio recordings is a central task within music processing, commonly known as multi-pitch estimation or frame-wise polyphonic music transcription. Deep-learning approaches recently achieved major improvements for this task, but the lack of annotated, large-size datasets beyond the piano solo scenario is still a limitation for fully exploiting their potential. In this paper, we propose a strategy for training a CNN-based multi-pitch estimator on weakly aligned score-audio pairs of pieces in different instrumentations. To this end, we make use of a multi-label variant of the connectionist temporal classification loss (MCTC), recently proposed for image recognition tasks. We re-formalize the MCTC loss to be applicable for multi-pitch estimation and perform several systematic experiments to analyze its behavior and robustness to training conditions. Finally, we report on multi-pitch estimation results for common datasets using weakly aligned training with MCTC, which performs similar than systems trained on strongly aligned scores.

Original languageEnglish
Title of host publication2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages121-125
Number of pages5
ISBN (Electronic)9781665448703
DOIs
Publication statusPublished - 1 Jan 2021
Event2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2021 - New Paltz, United States
Duration: 17 Oct 202120 Oct 2021

Publication series

NameIEEE Workshop on Applications of Signal Processing to Audio and Acoustics
Volume2021-October
ISSN (Print)1931-1168
ISSN (Electronic)1947-1629

Conference

Conference2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2021
Country/TerritoryUnited States
CityNew Paltz
Period17/10/2120/10/21

Keywords

  • CTC
  • Music processing
  • convolutional neural networks
  • multi-pitch estimation
  • music transcription

Fingerprint

Dive into the research topics of 'Learning Multi-Pitch Estimation from Weakly Aligned Score-Audio Pairs Using a Multi-Label CTC Loss'. Together they form a unique fingerprint.

Cite this