Research on noise robustness of speech recognition based on deep auto-encoder neural network

doi:10.3778/j.issn.1002-8331.1611-0217

Abstract

Abstract: To solve the problem of the center and the radius determined by randomly in the speech recognition tasks based on traditional Radial Basis Function（RBF） neural network, an unsupervised pre-training method which uses a large number of unlabeled data to initialize the network parameters is proposed to replace the traditional random initialization method based on the layered mechanism of human brain on speech recognition. This paper introduces the Deep Auto-Encoder（DAE） neural network as acoustical model and further analyzes robustness of speaker-independent isolated speech recognition on small size vocabulary database. The experimental results show that DAE outperforms RBF with MFCC（Mel Frequency Cepstrum Coefficient） feature extraction. In addition, compared to MFCC, GFCC（Gammatone Frequency Cepstrum Coefficient） gives more attribution on anti-noise property with a relative accuracy improvement of 1.87% in collaborate with DAE network.

Key words: speech recognition, robustness, Deep Auto-Encoder（DAE） neural network, Gammatone Frequency Cepstrum Coefficient（GFCC）, Mel Frequency Cepstrum Coefficient（MFCC）

摘要： 为了解决传统径向基（Radial basis function，RBF）神经网络在语音识别任务中基函数中心值和半径随机初始化的问题，从人脑对语音感知的分层处理机理出发，提出利用大量无标签数据初始化网络参数的无监督预训练方式代替传统随机初始化方法，使用深度自编码网络作为语音识别的声学模型，分析梅尔频率倒谱系数（Mel Frequency Cepstrum Coefficient，MFCC）和基于Gammatone听觉滤波器频率倒谱系数（Gammatone Frequency Cepstrum Coefficient，GFCC）下非特定人小词汇量孤立词的抗噪性能。实验结果表明，深度自编码网络在MFCC特征下较径向基神经网络表现出更优越的抗噪性能；而与经典的MFCC特征相比，GFCC特征在深度自编码网络下平均识别率相对提升1.87%。

关键词: 语音识别, 鲁棒性, 深度自编码网络, GFCC特征, MFCC特征

HUANG Lixia1, WANG Yanan1, ZHANG Xueying1, WANG Hongcui2. Research on noise robustness of speech recognition based on deep auto-encoder neural network[J]. Computer Engineering and Applications, 2017, 53(13): 49-54.

黄丽霞1，王亚楠1，张雪英1，王洪翠2. 基于深度自编码网络语音识别噪声鲁棒性研究[J]. 计算机工程与应用, 2017, 53(13): 49-54.

[1]	BAI Zhixu, WANG Hengjun, GUO Kexiang. Summary of Adversarial Examples Techniques Based on Deep Neural Networks [J]. Computer Engineering and Applications, 2021, 57(23): 61-70.
[2]	LI Song, LIU Zhe, TANG Xiaomei, WU Jian, WANG Feixue. Fixed-Point Iterated Huber-Based Robust Cubature Kalman Filter [J]. Computer Engineering and Applications, 2021, 57(16): 90-96.
[3]	HUANG Xiaoqi, WANG Li, LI Gang. Text-Image Generative Adversarial Model for Fusion Capsule Networks [J]. Computer Engineering and Applications, 2021, 57(14): 176-180.
[4]	CHEN Xiaowen, LIU Guangshuai, LIU Wanghua, LI Xurui. Pairwise Rotation-Invariant Co-occurrence Adaptive Complete Local Ternary Pattern [J]. Computer Engineering and Applications, 2021, 57(1): 219-226.
[5]	FAN Chunlong, HE Yufeng, WANG Yixin. Large Weight Suppression Strategy for Training Convolutional Neural Networks [J]. Computer Engineering and Applications, 2020, 56(2): 115-119.
[6]	YANG Yongpeng, YANG Zhenzhen, LI Jianlin, LE Jun. Low Rank and Sparse Decomposition and Its Application in Video and Image Processing [J]. Computer Engineering and Applications, 2020, 56(16): 21-30.
[7]	LOU Yingdan, XU Jinglin, HUANG Lixia, ZHANG Xueying. Speech Recognition Based on MLLR and MAP Under Distant Noise Reverberation Environment [J]. Computer Engineering and Applications, 2020, 56(10): 122-126.
[8]	LI Yun, MA Yinghong. Node Importance Rank by Attribute Reduction Set Evaluation [J]. Computer Engineering and Applications, 2019, 55(5): 149-158.
[9]	JIANG Yunbiao, GUO Chen, YU Haomiao. Robust Adaptive Attitude Control Algorithm for Autonomous Underwater Vehicle [J]. Computer Engineering and Applications, 2019, 55(17): 266-270.
[10]	HE Li, LIU Ying, HAN Keping. Improved TSVM Learning Algorithm Under Noise Labeling [J]. Computer Engineering and Applications, 2019, 55(17): 44-50.
[11]	ZHENG Hao, DONG Mingli, PAN Zhikang. Target tracking algorithm with multiple algorithms in collaboration based on image classification [J]. Computer Engineering and Applications, 2018, 54(4): 185-191.
[12]	YU Jingli, HU Enliang, ZHANG Tao. Study of Fisher linear discriminant analysis based on [L1]-norm [J]. Computer Engineering and Applications, 2018, 54(4): 128-134.
[13]	WEI Yu, SONG Tian. Lightweight and robust data reduction algorithms for Wireless Sensor Network [J]. Computer Engineering and Applications, 2018, 54(3): 100-108.
[14]	LUO Yu1, ZHOU Xing1, LONG Yu1, KUANG Hanbao1, Thomas Dreibholz2, TAN Yuyin1. Supporting verification from IPv6+MPTCP technology on upper-layer application [J]. Computer Engineering and Applications, 2018, 54(24): 79-86.
[15]	ZHAO Yue, LI Yaoqiang, XU Xiaona, WU Licheng. Near-optimal active learning for Tibetan speech recognition [J]. Computer Engineering and Applications, 2018, 54(22): 156-159.

Research on noise robustness of speech recognition based on deep auto-encoder neural network

基于深度自编码网络语音识别噪声鲁棒性研究

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics