TY - GEN
T1 - Comparing audio and video segmentations for music videos indexing
AU - Gillet, Olivier
AU - Richard, Gaël
PY - 2006/12/1
Y1 - 2006/12/1
N2 - Music videos are good examples of multimedia documents in which the structures of the audio and video streams are highly correlated. This paper presents a system that matches these structures and extracts audio-visual correlation measures. The audio and video streams are independently segmented at two-levels: shots (sections for audio) and events. Audio segmentation is performed at the event level by detecting onsets, and at the section level by a novelty detection algorithm identifying instrumentation changes. Video segmentation is performed at the event level by detecting changes in the motion intensity descriptor, and at the shot level by using a classical histogram-based shot detection algorithm. Audio-visual correlation measures are computed on the extracted structures. Possible applications include audio/video stream resynchronization, video retrieval from audio content, or classification of music videos by genre.
AB - Music videos are good examples of multimedia documents in which the structures of the audio and video streams are highly correlated. This paper presents a system that matches these structures and extracts audio-visual correlation measures. The audio and video streams are independently segmented at two-levels: shots (sections for audio) and events. Audio segmentation is performed at the event level by detecting onsets, and at the section level by a novelty detection algorithm identifying instrumentation changes. Video segmentation is performed at the event level by detecting changes in the motion intensity descriptor, and at the shot level by using a classical histogram-based shot detection algorithm. Audio-visual correlation measures are computed on the extracted structures. Possible applications include audio/video stream resynchronization, video retrieval from audio content, or classification of music videos by genre.
UR - https://www.scopus.com/pages/publications/33947677253
M3 - Conference contribution
AN - SCOPUS:33947677253
SN - 142440469X
SN - 9781424404698
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - V21-V24
BT - 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings
T2 - 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006
Y2 - 14 May 2006 through 19 May 2006
ER -