Soft nonnegative matrix co-factorizationwith application to multimodal speaker diarization

N. Seichepine, S. Essid, C. Fevotte, O. Cappe

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper presents a new method for bimodal nonnegative matrix factorization (NMF). This method is well-suited to situations where two streams of data are concurrently analyzed and are expected to be related by loosely common factors. It allows for a soft co-factorization, which takes into account the relationship that exists between the modalities being processed, but returns different factors for distinct modalities. There is no need that the data related with each modality live in the same feature space; there is also no need that they have the same dimensionality. The co-factorization is obtained via a majorization-minimization (MM) algorithm. The behavior of the method is illustrated on both synthetic and real-world data. In particular, we show that exploiting the correlation between audio and video modalities in edited talk-show videos improve speaker diarization results.

Original languageEnglish
Title of host publication2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings
Pages3537-3541
Number of pages5
DOIs
Publication statusPublished - 18 Oct 2013
Externally publishedYes
Event2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Vancouver, BC, Canada
Duration: 26 May 201331 May 2013

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
Country/TerritoryCanada
CityVancouver, BC
Period26/05/1331/05/13

Keywords

  • Nonnegative matrix factorization
  • co-factorization
  • multimodality
  • speaker diarization

Fingerprint

Dive into the research topics of 'Soft nonnegative matrix co-factorizationwith application to multimodal speaker diarization'. Together they form a unique fingerprint.

Cite this