TY - GEN
T1 - Learning Multi-Pitch Estimation from Weakly Aligned Score-Audio Pairs Using a Multi-Label CTC Loss
AU - Weiss, Christof
AU - Peeters, Geoffroy
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/1/1
Y1 - 2021/1/1
N2 - Detecting the simultaneous activity of pitches in music audio recordings is a central task within music processing, commonly known as multi-pitch estimation or frame-wise polyphonic music transcription. Deep-learning approaches recently achieved major improvements for this task, but the lack of annotated, large-size datasets beyond the piano solo scenario is still a limitation for fully exploiting their potential. In this paper, we propose a strategy for training a CNN-based multi-pitch estimator on weakly aligned score-audio pairs of pieces in different instrumentations. To this end, we make use of a multi-label variant of the connectionist temporal classification loss (MCTC), recently proposed for image recognition tasks. We re-formalize the MCTC loss to be applicable for multi-pitch estimation and perform several systematic experiments to analyze its behavior and robustness to training conditions. Finally, we report on multi-pitch estimation results for common datasets using weakly aligned training with MCTC, which performs similar than systems trained on strongly aligned scores.
AB - Detecting the simultaneous activity of pitches in music audio recordings is a central task within music processing, commonly known as multi-pitch estimation or frame-wise polyphonic music transcription. Deep-learning approaches recently achieved major improvements for this task, but the lack of annotated, large-size datasets beyond the piano solo scenario is still a limitation for fully exploiting their potential. In this paper, we propose a strategy for training a CNN-based multi-pitch estimator on weakly aligned score-audio pairs of pieces in different instrumentations. To this end, we make use of a multi-label variant of the connectionist temporal classification loss (MCTC), recently proposed for image recognition tasks. We re-formalize the MCTC loss to be applicable for multi-pitch estimation and perform several systematic experiments to analyze its behavior and robustness to training conditions. Finally, we report on multi-pitch estimation results for common datasets using weakly aligned training with MCTC, which performs similar than systems trained on strongly aligned scores.
KW - CTC
KW - Music processing
KW - convolutional neural networks
KW - multi-pitch estimation
KW - music transcription
U2 - 10.1109/WASPAA52581.2021.9632740
DO - 10.1109/WASPAA52581.2021.9632740
M3 - Conference contribution
AN - SCOPUS:85123420137
T3 - IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
SP - 121
EP - 125
BT - 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2021
Y2 - 17 October 2021 through 20 October 2021
ER -