Abstract
This paper deals with the automatic generation of music audio summaries from signal analysis without the use of any other information. The strategy employed here is to consider the audio signal as a succession of "states"(at various scales) corresponding to the structure (at various scales) of a piece of music. This is, of course, only applicable to certain kinds of musical genres based on some kind of repetition. From the audio signal, we first derive dynamic features representing the time evolution of the energy content in various frequency bands. These features constitute our observations from which we derive a representation of the music in terms of "states". Since human segmentation and grouping performs better upon subsequent hearings, this "natural"approach is followed here. The first pass of the proposed algorithm uses segmentation in order to create "templates". The second pass uses these templates in order to propose a structure of the music using unsupervised learning methods (K-means and hidden Markov model). The audio summary is finally constructed by choosing a representative example of each state. Further refinements of the summary audio signal construction, uses overlap-add, and a tempo detection/beat alignment in order to improve the audio quality of the created summary.
| Original language | English |
|---|---|
| Publication status | Published - 1 Jan 2002 |
| Externally published | Yes |
| Event | 3rd International Symposium on Music Information Retrieval, ISMIR 2002 - Paris, France Duration: 13 Oct 2002 → 17 Oct 2002 |
Conference
| Conference | 3rd International Symposium on Music Information Retrieval, ISMIR 2002 |
|---|---|
| Country/Territory | France |
| City | Paris |
| Period | 13/10/02 → 17/10/02 |