Abstract
In this paper, we propose a new representation as input of a Convolutional Neural Network with the goal of estimating music structure boundaries. For this task, previous works used a network performing the late-fusion of a Mel-scaled log-magnitude spectrogram and a self-similarity-lag-matrix. We propose here to use the squaresubmatrices centered on the main diagonals of several self-similarity-matrices, each one representing a different audio descriptors. We propose to combine them using the depth of the input layer. We show that this representation improves the results over the use of the self-similarity-lag-matrix. We also show that using the depth of the input layer provide a convenient way for early fusion of audio representations.
| Original language | English |
|---|---|
| Pages | 210-217 |
| Number of pages | 8 |
| Publication status | Published - 1 Jan 2017 |
| Event | 3rd AES International Conference on Semantic Audio 2017 - Erlangen, Germany Duration: 22 Jun 2017 → 24 Jun 2017 |
Conference
| Conference | 3rd AES International Conference on Semantic Audio 2017 |
|---|---|
| Country/Territory | Germany |
| City | Erlangen |
| Period | 22/06/17 → 24/06/17 |
Fingerprint
Dive into the research topics of 'Music structure boundaries estimation using multiple self-similarity matrices as input depth of convolutional neural networks'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver