Abstract
In this paper, we study the spectral and temporal periodicity representations that can be used to describe the characteristics of the rhythm of a music audio signal. A continuous-valued energy-function representing the onset positions over time is first extracted from the audio signal. From this function we compute at each time a vector which represents the characteristics of the local rhythm. Four feature sets are studied for this vector. They are derived from the amplitude of the discrete Fourier transform (DFT), the auto-correlation function (ACF), the product of the DFT and the ACF interpolated on a hybrid lag/frequency axis and the concatenated DFT and ACF coefficients. Then the vectors are sampled at some specific frequencies, which represent various ratios of the local tempo. The ability of these periodicity representations to describe the rhythm characteristics of an audio item is evaluated through a classification task. In this, we test the use of the periodicity representations alone, combined with tempo information and combined with a proposed set of rhythm features. The evaluation is performed using annotated and estimated tempo. We show that using such simple periodicity representations allows achieving high recognition rates at least comparable to previously published results.
| Original language | English |
|---|---|
| Pages (from-to) | 1242-1252 |
| Number of pages | 11 |
| Journal | IEEE Transactions on Audio, Speech and Language Processing |
| Volume | 19 |
| Issue number | 5 |
| DOIs | |
| Publication status | Published - 1 Jan 2011 |
| Externally published | Yes |
Keywords
- Audio features
- automatic indexing
- rhythm classification
- rhythm description