计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (3): 172-180.DOI: 10.3778/j.issn.1002-8331.2104-0258

• 模式识别与人工智能 • 上一篇    下一篇

结合自注意力和残差的BiLSTM_CNN文本分类模型

杨兴锐,赵寿为,张如学,杨兴俊,陶叶辉   

  1. 1.上海工程技术大学 数理与统计学院,上海 201620
    2.重庆大学 机械与运载工程学院,重庆 400044
    3.上海工程技术大学 管理学院,上海 201620
  • 出版日期:2022-02-01 发布日期:2022-01-28

BiLSTM_CNN Classification Model Based on Self-Attention and Residual Network

YANG Xingrui, ZHAO Shouwei, ZHANG Ruxue, YANG Xingjun, TAO Yehui   

  1. 1.School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science, Shanghai 201620, China
    2.College of Mechanical and Vehicle Engineering, Chongqing University, Chongqing 400044, China
    3.School of Management, Shanghai University of Engineering Science, Shanghai 201620, China
  • Online:2022-02-01 Published:2022-01-28

摘要: 双向长短期记忆网络(BiLSTM)和卷积神经网络(CNN)很难在文本的多分类任务中提取到足够的文本信息。提出了一种基于自注意力机制(self_attention)和残差网络(ResNet)的BiLSTM_CNN复合模型。通过自注意力赋予卷积运算后信息的权重,接着将池化后的特征信息层归一化并接入残差网络,让模型学习到残差信息,从而进一步提高模型的分类性能。在模型的运算过程中,使用了更加光滑的Mish非线性激活函数代替Relu。通过与深度学习模型对比,所提出的方法在准确率以及F1值评价指标上均优于现有模型,为文本分类问题提供了新的研究思路。

关键词: 自注意力机制, 双向长短期记忆网络, 残差网络, 卷积神经网络, 层归一化

Abstract: It is difficult to extract enough text information from multi-classification tasks with bi-directional long short-term memory(BiLSTM) and convolutional neural network(CNN). A BiLSTM_CNN compound model based on self-attention mechanism and residual network(ResNet) is proposed. The weight of the information after the convolution operation is given by the self-attention mechanism. The pooling feature information is processed by layer normalization and then connects to the residual network, so that the model can learn the residual informationand further improve the classification performance of the model. In the process of calculation, Mish nonlinear activation function is applied, which is more smooth than the common Relu function. Compared with common deep learning models, the proposed method is superior to the existing mainstream models in terms of the accuracy and F1 evaluation indicators. The proposed model provides new research ideas for text classification problems.

Key words: self-attention mechanism, bi-directional long short-term memory network(BiLSTM) , residual network, convolutional neural network(CNN), layer normalization