Comparing data-driven and phonetic N-gram systems for text-independent speaker verification

Asmaa El Hannani, Dijana Petrovska-Delacrétaz

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Recognition of speaker identity based on modeling the streams produced by phonetic decoders (phonetic speaker recognition) has gained popularity during the past few years. Two of the major problems that arise when phone based systems are being developed are the possible mismatches between the development and evaluation data and the lack of transcribed databases. Data-driven segmentation techniques provide a potential solution to these problems because they do not use transcribed data and can easily be applied on development data minimizing the mismatches. In this paper we compare speaker recognition results using phonetic and data-driven decoders. To this end, we have compared the results obtained with two sets of speaker verification systems; the first one based on data-driven units and the second one on phonetic units. Results obtained on the NIST 2006 Speaker Recognition Evaluation data show that the data-driven approach is comparable to the phonetic one and that further improvements can be achieved by combining both approaches.

Original languageEnglish
Title of host publicationIEEE Conference on Biometrics
Subtitle of host publicationTheory, Applications and Systems, BTAS'07
DOIs
Publication statusPublished - 1 Dec 2007
Externally publishedYes
Event1st IEEE International Conference on Biometrics: Theory, Applications, and Systems, BTAS '07 - Crystal City, VA, United States
Duration: 27 Sept 200729 Sept 2007

Publication series

NameIEEE Conference on Biometrics: Theory, Applications and Systems, BTAS'07

Conference

Conference1st IEEE International Conference on Biometrics: Theory, Applications, and Systems, BTAS '07
Country/TerritoryUnited States
CityCrystal City, VA
Period27/09/0729/09/07

Fingerprint

Dive into the research topics of 'Comparing data-driven and phonetic N-gram systems for text-independent speaker verification'. Together they form a unique fingerprint.

Cite this