Comparing audio and video segmentations for music videos indexing

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Music videos are good examples of multimedia documents in which the structures of the audio and video streams are highly correlated. This paper presents a system that matches these structures and extracts audio-visual correlation measures. The audio and video streams are independently segmented at two-levels: shots (sections for audio) and events. Audio segmentation is performed at the event level by detecting onsets, and at the section level by a novelty detection algorithm identifying instrumentation changes. Video segmentation is performed at the event level by detecting changes in the motion intensity descriptor, and at the shot level by using a classical histogram-based shot detection algorithm. Audio-visual correlation measures are computed on the extracted structures. Possible applications include audio/video stream resynchronization, video retrieval from audio content, or classification of music videos by genre.

Original languageEnglish
Title of host publication2006 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings
PagesV21-V24
Publication statusPublished - 1 Dec 2006
Externally publishedYes
Event2006 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006 - Toulouse, France
Duration: 14 May 200619 May 2006

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume5
ISSN (Print)1520-6149

Conference

Conference2006 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006
Country/TerritoryFrance
CityToulouse
Period14/05/0619/05/06

Fingerprint

Dive into the research topics of 'Comparing audio and video segmentations for music videos indexing'. Together they form a unique fingerprint.

Cite this