Résumé
In this paper, we propose a new representation as input of a Convolutional Neural Network with the goal of estimating music structure boundaries. For this task, previous works used a network performing the late-fusion of a Mel-scaled log-magnitude spectrogram and a self-similarity-lag-matrix. We propose here to use the squaresubmatrices centered on the main diagonals of several self-similarity-matrices, each one representing a different audio descriptors. We propose to combine them using the depth of the input layer. We show that this representation improves the results over the use of the self-similarity-lag-matrix. We also show that using the depth of the input layer provide a convenient way for early fusion of audio representations.
| langue originale | Anglais |
|---|---|
| Pages | 210-217 |
| Nombre de pages | 8 |
| état | Publié - 1 janv. 2017 |
| Evénement | 3rd AES International Conference on Semantic Audio 2017 - Erlangen, Allemagne Durée: 22 juin 2017 → 24 juin 2017 |
Une conférence
| Une conférence | 3rd AES International Conference on Semantic Audio 2017 |
|---|---|
| Pays/Territoire | Allemagne |
| La ville | Erlangen |
| période | 22/06/17 → 24/06/17 |
Empreinte digitale
Examiner les sujets de recherche de « Music structure boundaries estimation using multiple self-similarity matrices as input depth of convolutional neural networks ». Ensemble, ils forment une empreinte digitale unique.Contient cette citation
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver