TY - GEN
T1 - Unsupervised data-driven hidden markov modeling for text-dependent speaker verification
AU - Petrovska-Delacrétaz, Dijana
AU - Khemiri, Houssemeddine
N1 - Publisher Copyright:
Copyright © 2017 by SCITEPRESS - Science and Technology Publications, Lda. All rights reserved.
PY - 2017/1/1
Y1 - 2017/1/1
N2 - We present a text-dependent speaker verification system based on unsupervised data-driven Hidden Markov Models (HMMs) in order to take into account the temporal information of speech data. The originality of our proposal is to train unsupervised HMMs with only raw speech without transcriptions, that provide pseudo phonetic segmentation of speech data. The proposed text-dependent system is composed of the following steps. First, generic unsupervised HMMs are trained. Then the enrollment speech data for each target speaker is segmented with the generic models, and further processing is done in order to obtain speaker and text adapted HMMs, that will represent each speaker. During the test phase, in order to verify the claimed identity of the speaker, the test speech is segmented with the generic and the speaker dependent HMMs. Finally, two approaches based on log-likelihood ratio and concurrent scoring are proposed to compute the score between the test utterance and the speaker's model. The system is evaluated on Part1 of the RSR2015 database with Equal Error Rate (EER) on the development set, and Half Total Error Rate (HTER) on the evaluation set. An average EER of 1.29% is achieved on the development set, while for the evaluation part the average HTER is equal to 1.32%.
AB - We present a text-dependent speaker verification system based on unsupervised data-driven Hidden Markov Models (HMMs) in order to take into account the temporal information of speech data. The originality of our proposal is to train unsupervised HMMs with only raw speech without transcriptions, that provide pseudo phonetic segmentation of speech data. The proposed text-dependent system is composed of the following steps. First, generic unsupervised HMMs are trained. Then the enrollment speech data for each target speaker is segmented with the generic models, and further processing is done in order to obtain speaker and text adapted HMMs, that will represent each speaker. During the test phase, in order to verify the claimed identity of the speaker, the test speech is segmented with the generic and the speaker dependent HMMs. Finally, two approaches based on log-likelihood ratio and concurrent scoring are proposed to compute the score between the test utterance and the speaker's model. The system is evaluated on Part1 of the RSR2015 database with Equal Error Rate (EER) on the development set, and Half Total Error Rate (HTER) on the evaluation set. An average EER of 1.29% is achieved on the development set, while for the evaluation part the average HTER is equal to 1.32%.
KW - Concurrent scoring
KW - Hidden markov models
KW - Text-dependent speaker verification
KW - Unsupervised data-driven modeling
UR - https://www.scopus.com/pages/publications/85049431654
U2 - 10.5220/0006202001990207
DO - 10.5220/0006202001990207
M3 - Conference contribution
AN - SCOPUS:85049431654
T3 - ICPRAM 2017 - Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods
SP - 199
EP - 207
BT - ICPRAM 2017 - Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods
A2 - De Marsico, Maria De
A2 - di Baja, Gabriella Sanniti
A2 - Fred, Ana
PB - SciTePress
T2 - 6th International Conference on Pattern Recognition Applications and Methods, ICPRAM 2017
Y2 - 24 February 2017 through 26 February 2017
ER -