Data-driven high-level information for text-independent speaker verification

  • Asmaa El Hannani
  • , Dijana Petrovska-Delacrétaz

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Recently, various studies have shown that high-level features, such as linguistic content, pronunciation and idiolectal word usage, convey more speaker information and can be added to the low-level features in order to increase the robustness of the system. Usually these features are extracted by analyzing streams produced by phonetic speech recognition systems. Two of the major problems that arise when phone based systems are being developed are the possible mismatches between the development and evaluation data and the lack of transcribed databases. We propose in this paper to replace the phone-based approaches by data-driven segmentation methodologies. Our data-driven high-level systems do not use transcribed data and can easily be applied on development data minimizing the mismatches. These systems were fused with a state-of-the-art acoustic Gaussian Mixture Models (GMM) system. Results obtained on the NIST 2006 Speaker Recognition Evaluation data show that the data-driven features provide complementary information and the resulting fused system reduced the error rate in comparison to the GMM baseline system.

Original languageEnglish
Title of host publication2007 IEEE Workshop on Automatic Identification Advanced Technologies - Proceedings
Pages209-213
Number of pages5
DOIs
Publication statusPublished - 2 Oct 2007
Externally publishedYes
Event2007 IEEE Workshop on Automatic Identification Advanced Technologies, AUTOID 2007 - Alghero, Italy
Duration: 7 Jun 20078 Jun 2007

Publication series

Name2007 IEEE Workshop on Automatic Identification Advanced Technologies - Proceedings

Conference

Conference2007 IEEE Workshop on Automatic Identification Advanced Technologies, AUTOID 2007
Country/TerritoryItaly
CityAlghero
Period7/06/078/06/07

Fingerprint

Dive into the research topics of 'Data-driven high-level information for text-independent speaker verification'. Together they form a unique fingerprint.

Cite this