Abstract
Most of the time it is nearly impossible to differentiate between particular type of sound events from a waveform only. Therefore, frequency-domain and time-frequency domain representations have been used for years providing representations of the sound signals that are more in line with the human perception. However, these representations are usually too generic and often fail to describe specific content that is present in a sound recording. A lot of work has been devoted to design features that could allow extracting such specific information leading to a wide variety of hand-crafted features. During the past years, owing to the increasing availability of medium-scale and large-scale sound datasets, an alternative approach to feature extraction has become popular, the so-called feature learning. Finally, processing the amount of data that is at hand nowadays can quickly become overwhelming. It is therefore of paramount importance to be able to reduce the size of the dataset in the feature space. The general processing chain to convert a sound signal to a feature vector that can be efficiently exploited by a classifier and the relation to features used for speech and music processing are described in this chapter.
| Original language | English |
|---|---|
| Title of host publication | Computational Analysis of Sound Scenes and Events |
| Publisher | Springer International Publishing |
| Pages | 71-101 |
| Number of pages | 31 |
| ISBN (Electronic) | 9783319634500 |
| ISBN (Print) | 9783319634494 |
| DOIs | |
| Publication status | Published - 21 Sept 2017 |
| Externally published | Yes |
Keywords
- Audio signal processing
- Audio signal representation
- Dimensionality reduction
- Feature engineering
- Feature extraction
- Feature pooling
- Feature selection
- Multiscale representation
- Perceptually motivated features
- Representation learning
- Temporal integration
- Time-frequency representation