Computer Engineering and Applications ›› 2020, Vol. 56 ›› Issue (18): 124-130.DOI: 10.3778/j.issn.1002-8331.1907-0019

Previous Articles     Next Articles

End to End Speech Recognition Based on ResNet-BLSTM

HU Zhangfang, XU Xuan, FU Yaqin, XIA Zhiguang, MA Sudong   

  1. 1.School of Optoelectronic Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
    2.School of Advanced Manufacturing Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
  • Online:2020-09-15 Published:2020-09-10



  1. 1.重庆邮电大学 光电工程学院,重庆 400065
    2.重庆邮电大学 先进制造学院,重庆 400065


In the end-to-end speech recognition model based on deep learning, the input of the model adopts fixed length speech frames, which results in the loss of time-domain information and part of high-frequency information, resulting in low recognition rate and at weak robust of system. According to the above problem, this paper proposes a model based on the ResNet and the BLSTM, the model uses the spectrogram as input, and simultaneously designs the parallel convolution layer in the residual network, extracts features of different scales, and then performs features fusion, and finally uses the connection timing classification method to classify and realize an end-to-end speech recognition model. The experimental results show that compared with the traditional end-to-end model, the WER of the model in this paper decreases by 2.52% on the Aishell-1 speech set, and the robustness is better.

Key words: Residual Network(ResNet), Bi-directional Long Short-Term Memory(BLSTM), parallel convolutional layer, connectionist temporal classification



关键词: 残差网络(ResNet), 双向长短时记忆网络(BLSTM), 并行卷积层, 连接时序分类