A multimodal approach to initialisation for top-down speaker diarization of television shows

  • Simon Bozonnet
  • , F́elicien Vallet
  • , Nicholas Evans
  • , Slim Essid
  • , Gäel Richard
  • , Jean Carrive

Research output: Contribution to journalConference articlepeer-review

Abstract

This paper presents a new multimodal approach to speaker diarization of TV show data. We hypothesize that the intraspeaker variation in visual information might be less than that in the corresponding acoustic information and therefore might be better suited to the task of speaker model initialisation. This is an acknowledged weakness of the computationally efficient top-down approach to speaker diarization that is used here. Experimental results show that a recently proposed approach to purification and the new multimodal approach to initialisation together deliver 22% and 17% relative improvements in diarization performance over the baseline system on independent development and evaluation datasets respectively.

Original languageEnglish
Pages (from-to)581-585
Number of pages5
JournalEuropean Signal Processing Conference
Publication statusPublished - 1 Dec 2010
Event18th European Signal Processing Conference, EUSIPCO 2010 - Aalborg, Denmark
Duration: 23 Aug 201027 Aug 2010

Fingerprint

Dive into the research topics of 'A multimodal approach to initialisation for top-down speaker diarization of television shows'. Together they form a unique fingerprint.

Cite this