基于ResNet-BLSTM的端到端语音识别

doi:10.3778/j.issn.1002-8331.1907-0019

计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (18): 124-130.DOI: 10.3778/j.issn.1002-8331.1907-0019

基于ResNet-BLSTM的端到端语音识别

胡章芳，徐轩，付亚芹，夏志广，马苏东

1.重庆邮电大学光电工程学院，重庆 400065
2.重庆邮电大学先进制造学院，重庆 400065

出版日期:2020-09-15 发布日期:2020-09-10

End to End Speech Recognition Based on ResNet-BLSTM

HU Zhangfang, XU Xuan, FU Yaqin, XIA Zhiguang, MA Sudong

1.School of Optoelectronic Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
2.School of Advanced Manufacturing Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

Online:2020-09-15 Published:2020-09-10

摘要/Abstract

摘要：

基于深度学习的端到端语音识别模型中，由于模型的输入采用固定长度的语音帧，造成时域信息和部分高频信息损失进而导致识别率不高、鲁棒性差等问题。针对上述问题，提出了一种基于残差网络与双向长短时记忆网络相结合的模型，该模型采用语谱图作为输入，同时在残差网络中设计并行卷积层，提取不同尺度的特征，然后进行特征融合，最后采用连接时序分类方法进行分类，实现一个端到端的语音识别模型。实验结果表明，该模型在Aishell-1语音集上字错误率相较于传统端到端模型的WER下降2.52%，且鲁棒性较好。

关键词: 残差网络（ResNet）, 双向长短时记忆网络（BLSTM）, 并行卷积层, 连接时序分类

Abstract:

In the end-to-end speech recognition model based on deep learning, the input of the model adopts fixed length speech frames, which results in the loss of time-domain information and part of high-frequency information, resulting in low recognition rate and at weak robust of system. According to the above problem, this paper proposes a model based on the ResNet and the BLSTM, the model uses the spectrogram as input, and simultaneously designs the parallel convolution layer in the residual network, extracts features of different scales, and then performs features fusion, and finally uses the connection timing classification method to classify and realize an end-to-end speech recognition model. The experimental results show that compared with the traditional end-to-end model, the WER of the model in this paper decreases by 2.52% on the Aishell-1 speech set, and the robustness is better.

Key words: Residual Network（ResNet）, Bi-directional Long Short-Term Memory（BLSTM）, parallel convolutional layer, connectionist temporal classification

胡章芳，徐轩，付亚芹，夏志广，马苏东. 基于ResNet-BLSTM的端到端语音识别[J]. 计算机工程与应用, 2020, 56(18): 124-130.

HU Zhangfang, XU Xuan, FU Yaqin, XIA Zhiguang, MA Sudong. End to End Speech Recognition Based on ResNet-BLSTM[J]. Computer Engineering and Applications, 2020, 56(18): 124-130.

[1]	王嘉，张楠，孟凡云，王金鹤. 基于金字塔场景分析网络改进的语义分割算法[J]. 计算机工程与应用, 2021, 57(19): 220-227.
[2]	赵怡，高淑萍，何迪. 基于深度学习的眼动跟踪数据融合算法[J]. 计算机工程与应用, 2021, 57(10): 211-217.
[3]	关日钊，陈新度，吴磊，徐焯基. 基于改进的R-FCN带纹理透明塑料裂痕检测[J]. 计算机工程与应用, 2019, 55(6): 168-172.

基于ResNet-BLSTM的端到端语音识别

End to End Speech Recognition Based on ResNet-BLSTM

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 3

编辑推荐

Metrics