Most state of the art automatic speech recognition (ASR) systems are typically based on continuous Hidden Markov Models (HMMs) as acoustic modeling technique. It has been shown that the performance of HMM speech recognizers may be affected by a bad choice of the type of acoustic feature parameters in the acoustic front end module. For these reasons, we propose in this paper a dedicated isolated word recognition system based on HMMs which was carefully optimized specifically at the acoustic analysis and HMM acoustical modeling levels. Such conception was tested and valued on Hidden Markov model toolkit platform (HTK). Systems performances were evaluated using the TIMIT database. One comparative study was carried out using two types of speech analysis: The cepstral method referred to as Mel frequency cepstral coefficients (MFCC) and the perceptual linear predictive (PLP) coding are used for different tests so as to evaluate and reinforce our conception. The frame shift duration effect of the acoustic analysis as well as the addition of the dynamic coefficients of the acoustic parameters (MFCC and PLP) were carefully tested in order to look for high accuracy for our optimized isolated word recognition (IWR) system. Finally, various experiments related to the HMM topology have been carried out in order to get better recognition accuracies. In fact, the effect of some modeling parameters of HMM on the recognition accuracy of the IWR system such as the number of states as well as the number of Gaussian mixtures were analyzed in order to get the optimal HMM topology.
Key words: Isolated word recognition, perceptual linear predictive (PLP) coding, Mel frequency cepstral coefficients (MFCC) PLP, HMM, Hidden Markov model toolkit platform (HTK).
Abbreviations: ASR, automatic speech recognition; HMM, hidden Markov models;MFCC, mel frequency cepstral coefficient; PLP, perceptual linear predictive; IWR, Isolated word recognition; FIR, finite impulse response; P(W/O), posteriori probability.
Copyright © 2023 Author(s) retain the copyright of this article.
This article is published under the terms of the Creative Commons Attribution License 4.0