Downbeat detection with conditional random fields and deep learned features

Simon Durand, Slim Essid

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper, we introduce a novel Conditional Random Field (CRF) system that detects the downbeat sequence of musical audio signals. Feature functions are computed from four deep learned representations based on harmony, rhythm, melody and bass content to take advantage of the high-level and multi-faceted aspect of this task. Downbeats being dynamic, the powerful CRF classification system allows us to combine our features with an adapted temporal model in a fully data-driven fashion. Some meters being under-represented in our training set, we show that data augmentation enables a statistically significant improvement of the results by taking into account class imbalance. An evaluation of different configurations of our system on nine datasets shows its efficiency and potential over a heuristic based approach and four downbeat tracking algorithms.

Original languageEnglish
Title of host publicationProceedings of the 17th International Society for Music Information Retrieval Conference, ISMIR 2016
EditorsMichael I. Mandel, Johanna Devaney, Douglas Turnbull, George Tzanetakis
PublisherInternational Society for Music Information Retrieval
Pages386-392
Number of pages7
ISBN (Electronic)9780692755068
Publication statusPublished - 1 Jan 2016
Externally publishedYes
Event17th International Society for Music Information Retrieval Conference, ISMIR 2016 - New York, United States
Duration: 7 Aug 201611 Aug 2016

Publication series

NameProceedings of the 17th International Society for Music Information Retrieval Conference, ISMIR 2016

Conference

Conference17th International Society for Music Information Retrieval Conference, ISMIR 2016
Country/TerritoryUnited States
CityNew York
Period7/08/1611/08/16

Fingerprint

Dive into the research topics of 'Downbeat detection with conditional random fields and deep learned features'. Together they form a unique fingerprint.

Cite this