%0 Journal Article %A HU Zhangfang %A XU Xuan %A FU Yaqin %A XIA Zhiguang %A MA Sudong %T End to End Speech Recognition Based on ResNet-BLSTM %D 2020 %R 10.3778/j.issn.1002-8331.1907-0019 %J Computer Engineering and Applications %P 124-130 %V 56 %N 18 %X

In the end-to-end speech recognition model based on deep learning, the input of the model adopts fixed length speech frames, which results in the loss of time-domain information and part of high-frequency information, resulting in low recognition rate and at weak robust of system. According to the above problem, this paper proposes a model based on the ResNet and the BLSTM, the model uses the spectrogram as input, and simultaneously designs the parallel convolution layer in the residual network, extracts features of different scales, and then performs features fusion, and finally uses the connection timing classification method to classify and realize an end-to-end speech recognition model. The experimental results show that compared with the traditional end-to-end model, the WER of the model in this paper decreases by 2.52% on the Aishell-1 speech set, and the robustness is better.

%U http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.1907-0019