Main melody extraction with source-filter NMF and CRNN

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Estimating the main melody of a polyphonic audio recording remains a challenging task. We approach the task from a classification perspective and adopt a convolutional recurrent neural network (CRNN) architecture that relies on a particular form of pretraining by source-filter nonnegative matrix factorisation (NMF). The source-filter NMF decomposition is chosen for its ability to capture the pitch and timbre content of the leading voice/instrument, providing a better initial pitch salience than standard time-frequency representations. Starting from such a musically motivated representation, we propose to further enhance the NMF-based salience representations with CNN layers, then to model the temporal structure by an RNN network and to estimate the dominant melody with a final classification layer. The results show that such a system achieves state-of-the-art performance on the MedleyDB dataset without any augmentation methods or large training sets.

Original languageEnglish
Title of host publicationProceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018
EditorsEmilia Gomez, Xiao Hu, Eric Humphrey, Emmanouil Benetos
PublisherInternational Society for Music Information Retrieval
Pages82-89
Number of pages8
ISBN (Electronic)9782954035123
Publication statusPublished - 1 Jan 2018
Event19th International Society for Music Information Retrieval Conference, ISMIR 2018 - Paris, France
Duration: 23 Sept 201827 Sept 2018

Publication series

NameProceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018

Conference

Conference19th International Society for Music Information Retrieval Conference, ISMIR 2018
Country/TerritoryFrance
CityParis
Period23/09/1827/09/18

Fingerprint

Dive into the research topics of 'Main melody extraction with source-filter NMF and CRNN'. Together they form a unique fingerprint.

Cite this