Abstract
This chapter provides an overview of how deep learning techniques can be used for audio signals. We first review the main DNN architectures, meta-architectures and training paradigms used for audio processing. By highlighting the specifies of the audio signal, we discuss the various possible audio representations to be used as input of a DNN-time and frequency representations, waveform representations and knowledge-driven representations-and discuss how the first layers of a DNN can be set to take into account these specificity's. We then review a set of applications for three main classes of problems: audio recognition, audio processing and audio generation. We do this considering two types of audio content which are less commonly addressed in the literature: music and environmental sounds.
| Original language | English |
|---|---|
| Title of host publication | Multi-faceted Deep Learning |
| Subtitle of host publication | Models and Data |
| Publisher | Springer International Publishing |
| Pages | 231-266 |
| Number of pages | 36 |
| ISBN (Electronic) | 9783030744786 |
| ISBN (Print) | 9783030744779 |
| DOIs | |
| Publication status | Published - 20 Oct 2021 |
Fingerprint
Dive into the research topics of 'Deep learning for audio and music'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver