Spoken WordCloud: Clustering recurrent patterns in speech

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The automatic summarization of speech recordings is typically carried out as a two step process: the speech is first decoded using an automatic speech recognition system and the resulting text transcripts are processed to create a summary. However, this approach might not be suitable in adverse acoustic conditions or when applied to languages with limited training resources. In order to address these limitations, in this paper we propose an automatic speech summarization method that is based on the automatic discovery of recurrent patterns in the speech: recurrent acoustic patterns are first extracted from the audio and then are clustered and ranked according to the number of repetitions, creating an approximate acoustic summary of what was spoken. This approach allows us to build what we call a "Spoken WordCloud" termed after similarity with text-based word-clouds. We present an algorithm that achieves a cluster purity of up to 90% and an inverse purity of 71% in preliminary experiments using a small dataset of connected spoken words.

Original languageEnglish
Title of host publicationCBMi 2011 - 9th International Workshop on Content-Based Multimedia Indexing
Pages133-138
Number of pages6
DOIs
Publication statusPublished - 6 Sept 2011
Externally publishedYes
Event9th International Workshop on Content-Based Multimedia Indexing, CBMi 2011 - Madrid, Spain
Duration: 13 Jun 201115 Jun 2011

Publication series

NameProceedings - International Workshop on Content-Based Multimedia Indexing
ISSN (Print)1949-3991

Conference

Conference9th International Workshop on Content-Based Multimedia Indexing, CBMi 2011
Country/TerritorySpain
CityMadrid
Period13/06/1115/06/11

Fingerprint

Dive into the research topics of 'Spoken WordCloud: Clustering recurrent patterns in speech'. Together they form a unique fingerprint.

Cite this