TY - GEN
T1 - Data-driven high-level information for text-independent speaker verification
AU - Hannani, Asmaa El
AU - Petrovska-Delacrétaz, Dijana
PY - 2007/10/2
Y1 - 2007/10/2
N2 - Recently, various studies have shown that high-level features, such as linguistic content, pronunciation and idiolectal word usage, convey more speaker information and can be added to the low-level features in order to increase the robustness of the system. Usually these features are extracted by analyzing streams produced by phonetic speech recognition systems. Two of the major problems that arise when phone based systems are being developed are the possible mismatches between the development and evaluation data and the lack of transcribed databases. We propose in this paper to replace the phone-based approaches by data-driven segmentation methodologies. Our data-driven high-level systems do not use transcribed data and can easily be applied on development data minimizing the mismatches. These systems were fused with a state-of-the-art acoustic Gaussian Mixture Models (GMM) system. Results obtained on the NIST 2006 Speaker Recognition Evaluation data show that the data-driven features provide complementary information and the resulting fused system reduced the error rate in comparison to the GMM baseline system.
AB - Recently, various studies have shown that high-level features, such as linguistic content, pronunciation and idiolectal word usage, convey more speaker information and can be added to the low-level features in order to increase the robustness of the system. Usually these features are extracted by analyzing streams produced by phonetic speech recognition systems. Two of the major problems that arise when phone based systems are being developed are the possible mismatches between the development and evaluation data and the lack of transcribed databases. We propose in this paper to replace the phone-based approaches by data-driven segmentation methodologies. Our data-driven high-level systems do not use transcribed data and can easily be applied on development data minimizing the mismatches. These systems were fused with a state-of-the-art acoustic Gaussian Mixture Models (GMM) system. Results obtained on the NIST 2006 Speaker Recognition Evaluation data show that the data-driven features provide complementary information and the resulting fused system reduced the error rate in comparison to the GMM baseline system.
UR - https://www.scopus.com/pages/publications/34748823734
U2 - 10.1109/AUTOID.2007.380621
DO - 10.1109/AUTOID.2007.380621
M3 - Conference contribution
AN - SCOPUS:34748823734
SN - 1424412994
SN - 9781424412990
T3 - 2007 IEEE Workshop on Automatic Identification Advanced Technologies - Proceedings
SP - 209
EP - 213
BT - 2007 IEEE Workshop on Automatic Identification Advanced Technologies - Proceedings
T2 - 2007 IEEE Workshop on Automatic Identification Advanced Technologies, AUTOID 2007
Y2 - 7 June 2007 through 8 June 2007
ER -