Guiding audio source separation by video object information

Sanjeel Parekh, Slim Essid, Alexey Ozerov, Ngoc Q.K. Duong, Patrick Perez, Gael Richard

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this work we propose novel joint and sequential multimodal approaches for the task of single channel audio source separation in videos. This is done within the popular non-negative matrix factorization framework using information about the sounding object's motion. Specifically, we present methods that utilize non-negative least squares formulation to couple motion and audio information. The proposed techniques generalize recent work carried out on NMF-based motion-informed source separation and easily extend to video data. Experiments with two distinct multimodal datasets of string instrument performance recordings illustrate their advantages over the existing methods.

Original languageEnglish
Title of host publication2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages61-65
Number of pages5
ISBN (Electronic)9781538616321
DOIs
Publication statusPublished - 7 Dec 2017
Externally publishedYes
Event2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2017 - New Paltz, United States
Duration: 15 Oct 201718 Oct 2017

Publication series

NameIEEE Workshop on Applications of Signal Processing to Audio and Acoustics
Volume2017-October

Conference

Conference2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2017
Country/TerritoryUnited States
CityNew Paltz
Period15/10/1718/10/17

Keywords

  • Audio source separation
  • Audio-visual objects
  • Motion
  • Multimodal analysis
  • Nonnegative matrix factorization

Fingerprint

Dive into the research topics of 'Guiding audio source separation by video object information'. Together they form a unique fingerprint.

Cite this