Skip to main navigation Skip to search Skip to main content

Music structure boundaries estimation using multiple self-similarity matrices as input depth of convolutional neural networks

Research output: Contribution to conferencePaperpeer-review

Abstract

In this paper, we propose a new representation as input of a Convolutional Neural Network with the goal of estimating music structure boundaries. For this task, previous works used a network performing the late-fusion of a Mel-scaled log-magnitude spectrogram and a self-similarity-lag-matrix. We propose here to use the squaresubmatrices centered on the main diagonals of several self-similarity-matrices, each one representing a different audio descriptors. We propose to combine them using the depth of the input layer. We show that this representation improves the results over the use of the self-similarity-lag-matrix. We also show that using the depth of the input layer provide a convenient way for early fusion of audio representations.

Original languageEnglish
Pages210-217
Number of pages8
Publication statusPublished - 1 Jan 2017
Event3rd AES International Conference on Semantic Audio 2017 - Erlangen, Germany
Duration: 22 Jun 201724 Jun 2017

Conference

Conference3rd AES International Conference on Semantic Audio 2017
Country/TerritoryGermany
CityErlangen
Period22/06/1724/06/17

Fingerprint

Dive into the research topics of 'Music structure boundaries estimation using multiple self-similarity matrices as input depth of convolutional neural networks'. Together they form a unique fingerprint.

Cite this