Deep learning for audio and music

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

This chapter provides an overview of how deep learning techniques can be used for audio signals. We first review the main DNN architectures, meta-architectures and training paradigms used for audio processing. By highlighting the specifies of the audio signal, we discuss the various possible audio representations to be used as input of a DNN-time and frequency representations, waveform representations and knowledge-driven representations-and discuss how the first layers of a DNN can be set to take into account these specificity's. We then review a set of applications for three main classes of problems: audio recognition, audio processing and audio generation. We do this considering two types of audio content which are less commonly addressed in the literature: music and environmental sounds.

Original languageEnglish
Title of host publicationMulti-faceted Deep Learning
Subtitle of host publicationModels and Data
PublisherSpringer International Publishing
Pages231-266
Number of pages36
ISBN (Electronic)9783030744786
ISBN (Print)9783030744779
DOIs
Publication statusPublished - 20 Oct 2021

Fingerprint

Dive into the research topics of 'Deep learning for audio and music'. Together they form a unique fingerprint.

Cite this