BiLSTM_CNN Classification Model Based on Self-Attention and Residual Network

doi:10.3778/j.issn.1002-8331.2104-0258

Abstract

Abstract: It is difficult to extract enough text information from multi-classification tasks with bi-directional long short-term memory（BiLSTM） and convolutional neural network（CNN）. A BiLSTM_CNN compound model based on self-attention mechanism and residual network（ResNet） is proposed. The weight of the information after the convolution operation is given by the self-attention mechanism. The pooling feature information is processed by layer normalization and then connects to the residual network, so that the model can learn the residual informationand further improve the classification performance of the model. In the process of calculation, Mish nonlinear activation function is applied, which is more smooth than the common Relu function. Compared with common deep learning models, the proposed method is superior to the existing mainstream models in terms of the accuracy and F1 evaluation indicators. The proposed model provides new research ideas for text classification problems.

Key words: self-attention mechanism, bi-directional long short-term memory network（BiLSTM） , residual network, convolutional neural network（CNN）, layer normalization

摘要： 双向长短期记忆网络（BiLSTM）和卷积神经网络（CNN）很难在文本的多分类任务中提取到足够的文本信息。提出了一种基于自注意力机制（self_attention）和残差网络（ResNet）的BiLSTM_CNN复合模型。通过自注意力赋予卷积运算后信息的权重，接着将池化后的特征信息层归一化并接入残差网络，让模型学习到残差信息，从而进一步提高模型的分类性能。在模型的运算过程中，使用了更加光滑的Mish非线性激活函数代替Relu。通过与深度学习模型对比，所提出的方法在准确率以及F1值评价指标上均优于现有模型，为文本分类问题提供了新的研究思路。

关键词: 自注意力机制, 双向长短期记忆网络, 残差网络, 卷积神经网络, 层归一化

YANG Xingrui, ZHAO Shouwei, ZHANG Ruxue, YANG Xingjun, TAO Yehui. BiLSTM_CNN Classification Model Based on Self-Attention and Residual Network[J]. Computer Engineering and Applications, 2022, 58(3): 172-180.

杨兴锐, 赵寿为, 张如学, 杨兴俊, 陶叶辉. 结合自注意力和残差的BiLSTM_CNN文本分类模型[J]. 计算机工程与应用, 2022, 58(3): 172-180.

References

[1] JONES K S.A statistical interpretation of term specificity and its application in retrieval[J].Journal of Documentation，1972，28（1）：11-21.
[2] BENGIO Y，DUCHARME R，VINCENT P，et al.A neural probabilistic language model[J].Journal of Machine Learning Research，2003，3：1137-1155.
[3] MIKOLOV T，CHEN K，CORRADO G，et al.Efficient estimation of word representations in vector space[J].arXiv：1301.3781，2013.
[4] HOCHREITER S，SCHMIDHUBER J.Long short-term memory[J].Neural Computation，1997，9（8）：1735-1780.
[5] SCHUSTER M，PALIWAL K K.Bidirectional recurrent neural networks[J].IEEE Transactions on Signal Processing，2002，45（11）：2673-2681.
[6] KIM Y.Convolutional neural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing，2014.
[7] 何炎祥，孙松涛，牛菲菲，等.用于微博情感分析的一种情感语义增强的深度学习模型[J].计算机学报，2017，40（4）：773-790.
HE Y X，SUN S T，NIU F F，et al.A deep learning model enhanced with emotion semantics for microblog sentiment analysis[J].Chinese Journal of Computers，2017，40（4）：773-790.
[8] 李云红，梁思程，任劼，等.基于循环神经网络变体和卷积神经网络的文本分类方法[J].西北大学学报（自然科学版），2019，49（4）：573-579.
LI Y H，LIANG S C，REN J，et al.Text classification method based on recurrent neural network variants and convolutional neural network[J].Journal of Northwest University（Natural Science Edition），2019，49（4）：573-579.
[9] ZHOU P，QI Z，ZHENG S，et al.Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling[J].arXiv：1611.06639，2016.
[10] 李启行，廖薇，孟静雯.基于注意力机制的双通道DAC-RNN文本分类模型[J/OL].计算机工程与应用：1-9[2021-04-23].http：//kns.cnki.net/kcms/detail/11.2127.tp.20210420.
1354.070.html.
LI Q H，LIAO W，MENG J W.Dual-channel DAC-RNN text categorization model based on attention mechanism[J].Computer Engineering and Applications：1-9[2021-04-23].http：//kns.cnki.net/kcms/detail/11.2127.tp.20210420.1354.
070.html.
[11] 黄金杰，蔺江全，何勇军，等.局部语义与上下文关系的中文短文本分类算法[J].计算机工程与应用，2021，57（6）：94-100.
HUANG J J，LIN J Q，HE Y J，et al.Chinese short text classification algorithm based on local semantics and context[J].Computer Engineering and Applications，2021，57（6）：94-100.
[12] 徐绪堪，周泽聿.基于多尺度BiLSTM-CNN的微信推文的情感分类模型及应用研究[J].情报科学，2021，39（5）：130-137.
XU X K，ZHOU Z Y.A multi-scale BILSTM-CNN based emotion classification model for Wechat tweets and its application[J].Information Science，2021，39（5）：130-137.
[13] 景楠，史紫荆，舒毓民.基于注意力机制和CNN-LSTM模型的沪铜期货高频价格预测[J/OL].中国管理科学：1-13[2020?11?23].https：//doi.org/10.1638/j.cnki.issn1003?207x.
2020.0342.
JING N，SHI Z J，SHU Y M.Forecasting high frequency price of Shanghai copper futures based on attention mechanism and CNN-LSTM[J/OL].Chinese Journal of Management Science：1-13[2020-11-23].https：//doi.org/10.1638/j.cnki.issn1003-207x.2020.0342.
[14] VASWANI A，SHAZEER N，PARMAR N.et al.Attention is all you need[C]//Proceedings of the 31st Conference on Neural Information Processing Systems，New York，Curran Associates，2017：5998-6008.
[15] BAHDANAU D，CHO K，BENGIO Y.Neural machine translation by jointly learning to alignand translate[J].arXiv：1409.0473，2014.
[16] BJORCK N，GOMES C P，SELMAN B，et al.Underst-anding batch normalization[C]//Advancesin Neural Information Processing Systems，2018：7694-7705.
[17] BA J L，KIROS J R，HINTON G E.Layer normalization[J].arXiv：1607.06450，2016.
[18] HE K，ZHANG X，REN S，et al.Deep residual learning for imagerecognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：770-778.
[19] DIGANTA M.Mish：a self regularized non-monotonic neural activation function[J].arXiv：1908.08681，2019.
[20] HINTON G E，SRIVASTAVA N，KRIZHEVSKY A，et al.Improving neural networks by preventing co-adaptation of feature detectors[J].Computer Science，2012，3（4）：212-223.
[21] ZHOU C，SUN C，LIU Z，et al.A C-LSTM neural network for text classification[J].arXiv：1511.08630，2015.