HOG and subband power distribution image features for acoustic scene classification

Victor Bisot, Slim Essid, Gael Richard

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Acoustic scene classification is a difficult problem mostly due to the high density of events concurrently occurring in audio scenes. In order to capture the occurrences of these events we propose to use the Subband Power Distribution (SPD) as a feature. We extract it by computing the histogram of amplitude values in each frequency band of a spectrogram image. The SPD allows us to model the density of events in each frequency band. Our method is evaluated on a large acoustic scene dataset using support vector machines. We outperform the previous methods when using the SPD in conjunction with the histogram of gradients. To reach further improvement, we also consider the use of an approximation of the earth mover's distance kernel to compare histograms in a more suitable way. Using the so-called Sinkhorn kernel improves the results on most of the feature configurations. Best performances reach a 92.8% F1 score.

Original languageEnglish
Title of host publication2015 23rd European Signal Processing Conference, EUSIPCO 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages719-723
Number of pages5
ISBN (Electronic)9780992862633
DOIs
Publication statusPublished - 22 Dec 2015
Externally publishedYes
Event23rd European Signal Processing Conference, EUSIPCO 2015 - Nice, France
Duration: 31 Aug 20154 Sept 2015

Publication series

Name2015 23rd European Signal Processing Conference, EUSIPCO 2015

Conference

Conference23rd European Signal Processing Conference, EUSIPCO 2015
Country/TerritoryFrance
CityNice
Period31/08/154/09/15

Keywords

  • Acoustic scene classification
  • Sinkhorn distance
  • subband power distribution image
  • support vector machine

Fingerprint

Dive into the research topics of 'HOG and subband power distribution image features for acoustic scene classification'. Together they form a unique fingerprint.

Cite this