TY - GEN
T1 - Non-linear spectral subtraction (NSS) and hidden Markov models for robust speech recognition in car noise environments
AU - Lockwood, P.
AU - Boudy, J.
AU - Blanchet, M.
N1 - Publisher Copyright:
© 1992 IEEE.
PY - 1992/1/1
Y1 - 1992/1/1
N2 - Achieving reliable performance for a speech recogniser is an important challenge, especially in the context of mobile telephony applications where the user can access telephone functions through voice. This paper adresses the problem of speaker-dependent discrete utterance recognition in noise. Special reference is made to the mismatch effects due to the fact that training and testing are made in different environments. This contribution extends recently published work[11] where a robust HMM training/recognition framework is proposed. The present contribution introduces several new aspects: use of enhanced NSS schemes, introduction of root-MFCC parameters, use of dynamic features, training of HMMs by a dynamic inference scheme (DIHMM). These enhancements are discussed from tests performed on band limited signals (200-3000 Hz). We show that these various optimisations allow a rise from 20 % to over 99 % in performance. A 93% recognition rate is already achievable on raw data using a weighted modified projection and a root-MFCC dynamic representation.
AB - Achieving reliable performance for a speech recogniser is an important challenge, especially in the context of mobile telephony applications where the user can access telephone functions through voice. This paper adresses the problem of speaker-dependent discrete utterance recognition in noise. Special reference is made to the mismatch effects due to the fact that training and testing are made in different environments. This contribution extends recently published work[11] where a robust HMM training/recognition framework is proposed. The present contribution introduces several new aspects: use of enhanced NSS schemes, introduction of root-MFCC parameters, use of dynamic features, training of HMMs by a dynamic inference scheme (DIHMM). These enhancements are discussed from tests performed on band limited signals (200-3000 Hz). We show that these various optimisations allow a rise from 20 % to over 99 % in performance. A 93% recognition rate is already achievable on raw data using a weighted modified projection and a root-MFCC dynamic representation.
U2 - 10.1109/ICASSP.1992.225921
DO - 10.1109/ICASSP.1992.225921
M3 - Conference contribution
AN - SCOPUS:85009284942
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 265
EP - 268
BT - ICASSP 1992 - 1992 International Conference on Acoustics, Speech, and Signal Processing
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 1992 International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1992
Y2 - 23 March 1992 through 26 March 1992
ER -