TY - GEN
T1 - Analysis of common design choices in deep learning systems for downbeat tracking
AU - Fuentes, Magdalena
AU - McFee, Brian
AU - Crayencour, Hélène C.
AU - Essid, Slim
AU - Bello, Juan P.
N1 - Publisher Copyright:
© Magdalena Fuentes, Brian McFee, Hélène C. Crayencour, Slim Essid, Juan P. Bello.
PY - 2018/1/1
Y1 - 2018/1/1
N2 - Downbeat tracking consists of annotating a piece of musical audio with the estimated position of the first beat of each bar. In recent years, increasing attention has been paid to applying deep learning models to this task, and various architectures have been proposed, leading to a significant improvement in accuracy. However, there are few insights about the role of the various design choices and the delicate interactions between them. In this paper we offer a systematic investigation of the impact of largely adopted variants. We study the effects of the temporal granularity of the input representation (i.e. beat-level vs tatum-level) and the encoding of the networks outputs. We also investigate the potential of convolutional-recurrent networks, which have not been explored in previous downbeat tracking systems. To this end, we exploit a state-of-the-art recurrent neural network where we introduce those variants, while keeping the training data, network learning parameters and post-processing stages fixed. We find that temporal granularity has a significant impact on performance, and we analyze its interaction with the encoding of the networks outputs.
AB - Downbeat tracking consists of annotating a piece of musical audio with the estimated position of the first beat of each bar. In recent years, increasing attention has been paid to applying deep learning models to this task, and various architectures have been proposed, leading to a significant improvement in accuracy. However, there are few insights about the role of the various design choices and the delicate interactions between them. In this paper we offer a systematic investigation of the impact of largely adopted variants. We study the effects of the temporal granularity of the input representation (i.e. beat-level vs tatum-level) and the encoding of the networks outputs. We also investigate the potential of convolutional-recurrent networks, which have not been explored in previous downbeat tracking systems. To this end, we exploit a state-of-the-art recurrent neural network where we introduce those variants, while keeping the training data, network learning parameters and post-processing stages fixed. We find that temporal granularity has a significant impact on performance, and we analyze its interaction with the encoding of the networks outputs.
M3 - Conference contribution
AN - SCOPUS:85065981409
T3 - Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018
SP - 106
EP - 112
BT - Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018
A2 - Gomez, Emilia
A2 - Hu, Xiao
A2 - Humphrey, Eric
A2 - Benetos, Emmanouil
PB - International Society for Music Information Retrieval
T2 - 19th International Society for Music Information Retrieval Conference, ISMIR 2018
Y2 - 23 September 2018 through 27 September 2018
ER -