Automatic Recognition of Sound Categories from Their Vocal Imitation Using Audio Primitives Automatically Found by SI-PLCA and HMM

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper we study the automatic recognition of sound categories (such as fridge, mixers or sawing sounds) from their vocal imitations. Vocal imitations are made of a succession over time of sounds produced using vocal mechanisms that can largely differ from the ones used in speech. We develop here a recognition approach inspired by automatic-speech-recognition systems, with an acoustic model (that maps the audio signal to a set of probability over “phonemes”) and a language model (that represents the expected succession of “phonemes” for each sound category). Since we do not know what are the underlying “phonemes” of vocal imitations we propose to automatically estimate them using Shift-Invariant Probabilistic Latent Component Analysis (SI-PLCA) applied to a dataset of vocal imitations. The kernel distributions of the SI-PLCA are considered as the “phonemes” of vocal imitation and its impulse distributions are used to compute the emission probabilities of the states of a set of Hidden Markov Models (HMMs). To evaluate our proposal, we test it for a task of automatically recognizing 12 sound categories from their vocal imitations.

Original languageEnglish
Title of host publicationMusic Technology with Swing - 13th International Symposium, CMMR 2017, Revised Selected Papers
EditorsMatthew E.P. Davies, Mitsuko Aramaki, Richard Kronland-Martinet, Sølvi Ystad
PublisherSpringer Verlag
Pages3-22
Number of pages20
ISBN (Print)9783030016913
DOIs
Publication statusPublished - 1 Jan 2018
Event13th international Symposium on Computer Music Multidisciplinary Research, CMMR 2017 - Matosinhos, Portugal
Duration: 25 Sept 201728 Sept 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11265 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference13th international Symposium on Computer Music Multidisciplinary Research, CMMR 2017
Country/TerritoryPortugal
CityMatosinhos
Period25/09/1728/09/17

Keywords

  • Hidden markov model
  • Shift-invariant probabilistic-latent-component-analysis
  • Sound design
  • Sound recognition
  • Vocal imitation

Fingerprint

Dive into the research topics of 'Automatic Recognition of Sound Categories from Their Vocal Imitation Using Audio Primitives Automatically Found by SI-PLCA and HMM'. Together they form a unique fingerprint.

Cite this