基于时域建模的自动语音识别

doi:10.3778/j.issn.1002-8331.1708-0016

计算机工程与应用 ›› 2017, Vol. 53 ›› Issue (20): 243-248.DOI: 10.3778/j.issn.1002-8331.1708-0016

基于时域建模的自动语音识别

王海坤，伍大勇，刘江，王士进，胡国平，胡郁

科大讯飞股份有限公司研究院，合肥 230088

出版日期:2017-10-15 发布日期:2017-10-31

Automatic speech recognition based on time domain modeling

WANG Haikun, WU Dayong, LIU Jiang, WANG Shijin, HU Guoping, HU Yu

Research of IFLYTEK CO., LTD, Hefei 230088, China

Online:2017-10-15 Published:2017-10-31

摘要/Abstract

摘要： 端到端神经网络能够根据特定的任务自动学习从原始数据到特征的变换，解决人工设计的特征与任务不匹配的问题。以往语音识别的端到端网络采用一层时域卷积网络作为特征提取模型，递归神经网络和全连接前馈深度神经网络作为声学模型的方式，在效果和效率两个方面具有一定的局限性。从特征提取模块的效果以及声学模型的训练效率角度，提出多时间频率分辨率卷积网络与带记忆模块的前馈神经网络相结合的端到端语音识别模型。实验结果表明，所提方法语音识别在真实录制数据集上较传统方法字错误率下降10%，训练时间减少80%。

关键词: 卷积神经网络, 递归神经网络, 声学模型, 端到端模型

Abstract: End-to-end neural networks can automatically learn feature transformation from original data, which can solve the mismatch between hand designed features and specific tasks. The traditional end-to-end neural network for speech recognition uses a time domain convolution network as the feature extraction model, recurrent neural network and full connected feed-forwarddeep neural network as the acoustic model, which has some limitations in performance and efficiency. From the aspects of the performanceof thefeature extraction module and the training efficiency of the acoustic model, an end-to-end speech recognition model combining the multi-time and frequency resolution convolution and the feed-forward neural network with memory modules is proposed. On the real recording test dataset, the proposed method reduces the word error rate by 10%, training time by 80% compared with the traditional method.

Key words: convolution neural network, recurrent neural network, acoustic model, end-to-end neural network

王海坤，伍大勇，刘江，王士进，胡国平，胡郁. 基于时域建模的自动语音识别[J]. 计算机工程与应用, 2017, 53(20): 243-248.

WANG Haikun, WU Dayong, LIU Jiang, WANG Shijin, HU Guoping, HU Yu. Automatic speech recognition based on time domain modeling[J]. Computer Engineering and Applications, 2017, 53(20): 243-248.

[1]	牟清萍，张莹，张东波，王新杰，杨知桥. 目标丢失判别机制的视觉跟踪算法及应用研究[J]. 计算机工程与应用, 2021, 57(9): 140-147.
[2]	包志强，邢瑜，吕少卿，黄琼丹. 改进YOLO V2的6D目标姿态估计算法[J]. 计算机工程与应用, 2021, 57(9): 148-153.
[3]	赵志焱，杨华，胡志伟，宇海萍. 基于TACNN的玉露香梨叶虫害识别[J]. 计算机工程与应用, 2021, 57(9): 176-181.
[4]	周伦钢，孙怡峰，王坤，吴疆，黄维贵，李炳龙. 目标多种多值属性的端端快速识别网络[J]. 计算机工程与应用, 2021, 57(9): 182-190.
[5]	张成，戴俊峰，熊闻心. 融合LeNet-5改进的扫描文档手写日期识别[J]. 计算机工程与应用, 2021, 57(9): 207-211.
[6]	麻哲旭，杨峰，乔旭. 铁路路基病害智能检测方法[J]. 计算机工程与应用, 2021, 57(9): 272-278.
[7]	冉蓉，徐兴华，邱少华，崔小鹏，欧阳斌. 基于深度卷积神经网络的裂纹检测方法综述[J]. 计算机工程与应用, 2021, 57(9): 23-35.
[8]	张小峰，谢钧，罗健欣，杨涛. 深度学习语音合成技术综述[J]. 计算机工程与应用, 2021, 57(9): 50-59.
[9]	张越，黄友锐，刘鹏坤. 引入注意力机制的多分辨率人体姿态估计研究[J]. 计算机工程与应用, 2021, 57(8): 126-132.
[10]	李现国，冯欣欣，李建雄. 多尺度残差网络的单幅图像超分辨率重建[J]. 计算机工程与应用, 2021, 57(7): 215-221.
[11]	梁芳烜，杨锋，卢丽云，尹梦晓. 基于卷积神经网络的脑肿瘤分割方法综述[J]. 计算机工程与应用, 2021, 57(7): 34-43.
[12]	杨培伟，周余红，邢岗，田智强，许夏瑜. 卷积神经网络在生物医学图像上的应用进展[J]. 计算机工程与应用, 2021, 57(7): 44-58.
[13]	常昊，陈晓雷，张爱华，李策，林冬梅. 嵌入改进SENet的卷积神经网络连续血压预测[J]. 计算机工程与应用, 2021, 57(7): 130-135.
[14]	王翀，韩振奇，徐浩煜，祝永新，徐胜，陈夏. 基于改进显著图的高效裂纹检测算法[J]. 计算机工程与应用, 2021, 57(6): 219-224.
[15]	黄金杰，蔺江全，何勇军，何瑾洁，王雅君. 局部语义与上下文关系的中文短文本分类算法[J]. 计算机工程与应用, 2021, 57(6): 94-100.

基于时域建模的自动语音识别

Automatic speech recognition based on time domain modeling

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics