End-to-End Mandarin Speech Recognition with Improved Convolution Input

doi:10.3778/j.issn.1002-8331.1805-0486

Abstract

Abstract: The cross-entropy criterion of mainstream neural network training is to classify and optimize each frame of acoustic data，while the continuous speech recognition uses the sequence-level transcription accuracy as a performance measure. In view of this difference, an end-to-end speech recognition system based on sequence level transcription is constructed in this paper.In order to solve the problem of poor system performance under the condition of low resource corpus,the model uses convolution neural network to deal with the input features, selects the best network structure, and performs two-dimensional convolution in the time and frequency domains thus improves the small disturbance influence caused by different environment and speaker in the input space. At the same time, neural network uses batch normalization technology to reduce generalization error and speed up training. Finally, based on the large language model, the hyper-parameters in decoding process are optimized to improve the modeling effect. Experimental results show that the system performance is improved by about 24%，better than mainstream speech recognition systems.

Key words: sequence level, low resource, end-to-end, convolution neural network, batch normalization

摘要： 主流神经网络训练的交叉熵准则是对声学数据的每个帧进行分类优化，而连续语音识别是以序列级转录准确性为性能度量。针对这个不同，构建基于序列级转录的端到端语音识别系统。针对低资源语料条件下系统性能不佳的问题，其中模型使用卷积神经网络对输入特征进行处理，选取最佳的网络结构，在时域和频域进行二维卷积，从而改善输入空间中因不同环境和说话人产生的小扰动影响。同时神经网络使用批量归一化技术来减少泛化误差，加速训练。基于大型的语言模型，优化解码过程中的超参数，提高模型建模效果。实验结果表明系统性能提升约24%，优于主流语音识别系统。

关键词: 序列级, 低资源, 端到端, 卷积神经网络, 批量归一化

WANG Yanzhe, ZHANG Limin, ZHANG Bingqiang, LI Zhenyu. End-to-End Mandarin Speech Recognition with Improved Convolution Input[J]. Computer Engineering and Applications, 2019, 55(17): 143-149.

王彦哲，张立民，张兵强，李振宇. 改进卷积输入的端到端普通话语音识别[J]. 计算机工程与应用, 2019, 55(17): 143-149.

[1]	LI Xianguo, FENG Xinxin, LI Jianxiong. Sigle Image Super-Resolution Reconstruction Based on Multi-scale Residual Network [J]. Computer Engineering and Applications, 2021, 57(7): 215-221.
[2]	HE Yubo, LIU Kun. Detection of Sea-Surface Saliency Object Based on Convolutional Neural Network [J]. Computer Engineering and Applications, 2021, 57(6): 108-116.
[3]	ZHAO Fan, ZHANG Lin, WEN Zhiquan, YANG Linlin, LIN Guangfeng. Direct and Efficient Natural Scene Chinese Character Approaching Spotting Method [J]. Computer Engineering and Applications, 2021, 57(6): 159-167.
[4]	NIE Yongqi, CAO Hui, YANG Feng, LIU Jing. Review of Application of Deep Learning in Detection of Diabetic Retinal Lesions [J]. Computer Engineering and Applications, 2021, 57(20): 25-41.
[5]	YU Juan，LUO Shun. Detection Method of Illegal Building Based on YOLOv5 [J]. Computer Engineering and Applications, 2021, 57(20): 236-244.
[6]	LI Wenliang, YANG Qiuxiang, QIN Quan. Multi-feature Mixed Model Text Sentiment Analysis Method [J]. Computer Engineering and Applications, 2021, 57(19): 205-213.
[7]	TANG Renwei, LIU Qihe, TAN Hao. Review of Neural Style Transfer Models [J]. Computer Engineering and Applications, 2021, 57(19): 32-43.
[8]	CHENG Qing, FAN Man, LI Yandong, ZHAO Yuan, LI Chenglong. Review on Semantic Segmentation of UAV Aerial Images [J]. Computer Engineering and Applications, 2021, 57(19): 57-69.
[9]	XUE Wenlong, YU Jiong, GUO Zhiqi, LI Ziyang. End-to-End Encrypted Traffic Classification Based on Feature Fusion Convolutional Neural Network [J]. Computer Engineering and Applications, 2021, 57(18): 114-121.
[10]	CHEN Zhiwu, CHENG Xi, ZENG Li, QIAN Xiaoliang. Research Progress Review of Co-saliency Detection [J]. Computer Engineering and Applications, 2021, 57(17): 37-45.
[11]	XU Xialing, LIU Tao, TIAN Guohui, YU Wenjuan, XIAO Dajun, LIANG Shanpeng. Review of Occlusion Face Recognition Methods [J]. Computer Engineering and Applications, 2021, 57(17): 46-60.
[12]	ZHANG Wu, ZHOU Xingyu, ZOU Junhua, PAN Zhisong, DUAN Yexin, CHEN Jun. Adversarial Attack Algorithm Based on Erosion Batch Normalization [J]. Computer Engineering and Applications, 2021, 57(16): 116-124.
[13]	XIA Mengqi, HAO Kun, ZHAO Lu. Monocular Image Depth Estimation Based on Fully Convolutional Encoder-Decoder Network [J]. Computer Engineering and Applications, 2021, 57(14): 231-236.
[14]	TENG Jinbao, KONG Weiwei, TIAN Qiaoxin, WANG Zhaoqian. Text Classification Method Based on LSTM-Attention and CNN Hybrid Model [J]. Computer Engineering and Applications, 2021, 57(14): 126-133.
[15]	LIANG Hong, WANG Qingwei, ZHANG Qian, LI Chuanxiu. Small Object Detection Technology： A Review [J]. Computer Engineering and Applications, 2021, 57(1): 17-28.

End-to-End Mandarin Speech Recognition with Improved Convolution Input

改进卷积输入的端到端普通话语音识别

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics