Abstract
This chapter addresses sound scene and event classification in multiview settings, that is, settings where the observations are obtained from multiple sensors, each sensor contributing a particular view of the data (e.g., audio microphones, video cameras, etc.). We briefly introduce some of the techniques that can be exploited to effectively combine the data conveyed by the different views under analysis for a better interpretation. We first provide a high-level presentation of generic methods that are particularly relevant in the context of multiview and multimodal sound scene analysis. Then, we more specifically present a selection of techniques used for audiovisual event detection and microphone array-based scene analysis.
| Original language | English |
|---|---|
| Title of host publication | Computational Analysis of Sound Scenes and Events |
| Publisher | Springer International Publishing |
| Pages | 243-276 |
| Number of pages | 34 |
| ISBN (Electronic) | 9783319634500 |
| ISBN (Print) | 9783319634494 |
| DOIs | |
| Publication status | Published - 21 Sept 2017 |
| Externally published | Yes |
Keywords
- Audio source localization and tracking
- Audio source separation
- Beamforming
- Data fusion
- Joint audiovisual scene analysis
- Matrix factorization
- Multichannel Wiener filtering
- Multichannel audio
- Multimodal scene analysis
- Multiview scene analysis
- Representation learning
- Tensor factorization