Speaker identification strategies are well convincing in their performance when clean speeches are scrutinized. But the performance degrades when speech samples are corrupted by narrowband noise. Block truncation of the cepstral coefficients ensures that not all the features are affected by narrowband noise but it cannot reduce the extent of degradation. This work is focused towards improving the performance of speaker identification systems by block truncating the features which are subjected to wavelet processing. Wavelet decomposition divides the entire energy spectrum of the speech signal into bands corresponding to the number of levels of decomposition performed in the wavelet transformation thereby segregating the noise affected bands from other bands. In addition to that, wavelet filters provide the smoothening of the noisy speech signals which enhances the identification of the correct speaker. Dynamic Mel filtering of these wavelet coefficients followed by block truncation provides better identification, taking advantage of the fact that some filter bank coefficients remain unaffected by narrowband noise. The features are modeled by Gaussian mixture model - Universal background model (GMM-UBM) that serves as a generic one timed trained model. Speaker identification efficiency of 97.23% is achieved through this wavelet based dynamic MFCC technique which exhibits 7.58% improvement in speaker identification accuracy when compared with non wavelet based block truncation method.
Key words: Wavelet decomposition, block truncation, Dynamic Mel Filtering Cepstral Coefficients (DMFCC), Gaussian mixture model - Universal background model (GMM-UBM), speaker identification.
MFCC, Mel Frequency Cepstral Coefficients; UBM, Universal Background Models; DCT, Discrete Cosine Transform; MFLE, Mel Filter Log Energies.
Copyright © 2021 Author(s) retain the copyright of this article.
This article is published under the terms of the Creative Commons Attribution License 4.0