Speaker: Navneeth K (EE12S065)
The research focuses on improving the recognition performance of Automatic Speech Recognition (ASR) systems in noisy environment. It is a known fact that performance of ASR degrades under noisy test conditions. If the features for building the acoustic model are extracted from clean train utterances, the features corresponding to noisy test utterances would be acoustically mismatched with the trained statistical model. Feature normalization techniques like Cepstral Mean Normalization (CMN), Cepstral Mean and Variance Normalization (CMVN) and Histogram Equalization (HEQ) are known to perform well under noisy conditions. However, the performance of HEQ degrades when test utterances are short as the amount of data is insufficient to estimate a robust histogram for the test utterance.
We propose an algorithm to obtain smooth histograms even for a short test utterance. The smooth histogram is obtained as a weighted linear combination of pre-computed smooth histograms obtained during training, where the weights are computed inexpensively. Experiments show that using this smooth histogram for equalization consistently improves the performance over conventional HEQ in Aurora2 (short utterances) and Aurora4 databases using continuous density hidden Markov model (CDHMM) and deep neural network (DNN). We obtain a relative WER reduction of 23.58%, 14.91% and 24.12% for short utterances in Aurora2 database using CDHMM, DNN with mel-frequency cepstral coefficients (MFCC) and DNN with log mel filter bank features (FBANK) respectively.