Scale and shift invariant time/frequency representation using auditory statistics: Application to rhythm description

Ugo Marchand, Geoffroy Peeters

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper we propose two novel scale and shift-invariant time-frequency representations of the audio content. Scale-invariance is a desired property to describe the rhythm of an audio signal as it will allow to obtain the same representations for same rhythms played at different tempi. This property can be achieved by expressing the time-axis in log-scale, for example using the Scale Transform (ST). Since the frequency locations of the audio content are also important, we previously extended the ST to the Modulation Scale Spectrum (MSS). However, this MSS does not allow to represent the inter-relationship between the audio content existing in various frequency bands. To solve this issue, we propose here two novel representations. The first one is based on the 2D Scale Transform, the second on statistics (inspired by the auditory experiments of McDermott) that represent the interrelationship between the various frequency bands. We apply both representations to a task of rhythm class recognition and demonstrates their benefits. We show that the introduction of auditory statistics allows a large increase of the recognition results.

Original languageEnglish
Title of host publication2016 IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2016 - Proceedings
EditorsKostas Diamantaras, Aurelio Uncini, Francesco A. N. Palmieri, Jan Larsen
PublisherIEEE Computer Society
ISBN (Electronic)9781509007462
DOIs
Publication statusPublished - 8 Nov 2016
Externally publishedYes
Event26th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2016 - Proceedings - Vietri sul Mare, Salerno, Italy
Duration: 13 Sept 201616 Sept 2016

Publication series

NameIEEE International Workshop on Machine Learning for Signal Processing, MLSP
Volume2016-November
ISSN (Print)2161-0363
ISSN (Electronic)2161-0371

Conference

Conference26th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2016 - Proceedings
Country/TerritoryItaly
CityVietri sul Mare, Salerno
Period13/09/1616/09/16

Keywords

  • 2D-Fourier
  • 2D-Scale
  • Fourier-Mellin Transform
  • auditory statistics
  • rhythm description

Fingerprint

Dive into the research topics of 'Scale and shift invariant time/frequency representation using auditory statistics: Application to rhythm description'. Together they form a unique fingerprint.

Cite this