融合非负正弦位置编码和混合注意力机制的情感分析模型

doi:10.3778/j.issn.1002-8331.2304-0255

摘要/Abstract

摘要： 针对情感分析任务中，序列模型存在难以获取文本的相对位置信息，且处理较长序列时容易丢失关键信息等问题，提出了一种融合非负正弦位置编码（non-negative sinusoidal position encoding，NSPE）和混合注意力机制（hybrid attention mechanism，HAM）的双向长短期记忆网络（bi-directional long short-term memory，Bi-LSTM）情感分析模型（NSPEHA-BiLSTM）。提出NSPE方法，建立词语的NSPE，为词向量融入相对位置信息；通过Bi-LSTM提取文本特征，并基于HAM分别对特征的全局和局部特征进行赋权，确保关键信息的准确传递；通过全连接层实现文本情感分析。在IMDB数据集中，NSPEA-BiLSTM相较于Bi-LSTM和Text-CNN准确率分别提升了4.67和2.02个百分点，且输入的文本长度越长，模型效果越好，同时验证了NSPE优于其他位置编码。

关键词: 情感分析, 双向长短期记忆网络（Bi-LSTM）, 非负正弦位置编码（NSPE）, 混合注意力机制（HAM）

Abstract: NSPEHA-BiLSTM is proposed to address the issues of sequence models in sentiment analysis tasks, such as difficulty in obtaining the relative positional information of text and the loss of critical information when processing long sequences. The model integrates non-negative sinusoidal position encoding (NSPE) and hybrid attention mechanism (HAM) to incorporate relative positional information into word embeddings and weight the global and local information features of text using HAM, respectively, ensuring the accurate transmission of critical information. The text features are extracted by Bi-LSTM, and sentiment analysis is performed using a fully connected layer. NSPEHA-BiLSTM achieves higher accuracy than Bi-LSTM and Text-CNN by 4.67 and 2.02 percentage points, respectively, on the IMDB dataset, and the model performance improves with longer input text. The results also verify that NSPE is superior to other position encodings.

Key words: sentiment analysis, bi-directional long short-term memory (Bi-LSTM), non-negative sinusoidal position encoding (NSPE), hybrid attention mechanism (HAM)

郑志超, 陈进东, 张健. 融合非负正弦位置编码和混合注意力机制的情感分析模型[J]. 计算机工程与应用, 2024, 60(15): 101-110.

ZHENG Zhichao, CHEN Jindong, ZHANG Jian. Sentiment Classification Model Based on Non-Negative Sinusoidal Positional Encoding and Hybrid Attention Mechanism[J]. Computer Engineering and Applications, 2024, 60(15): 101-110.

参考文献

[1] KIM Y. Convolutional neural networks for sentence classification[J]. arXiv:1408.5882, 2014.
[2] SOCHER R, PENNINGTON J, HUANG E H, et al. Semi-supervised recursive autoencoders for predicting sentiment distributions[C]//Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 2011: 151-161.
[3] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[4] CHUNG J, GULCEHRE C, CHO K, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[C]//NIPS 2014 Workshop on Deep Learning, December 11, 2014.
[5] SUNDERMEYER M, SCHLüTER R, NEY H. LSTM neural networks for language modeling[C]//Thirteenth Annual Conference of the International Speech Communication Association, 2012.
[6] YANG Z, YANG D, DYER C, et al. Hierarchical attention networks for document classification[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016: 1480-1489.
[7] LUONG M T, PHAM H, MANNING C D. Effective approaches to attention-based neural machine translation[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015: 1412-1421.
[8] SPERBER M, NIEHUES J, NEUBIG G, et al. Self-attentional acoustic models[C]//Proceedings of the Interspeech 2018, 2018: 3723-3727.
[9] YANG B, LI J, WONG D F, et al. Context-aware self-attention networks[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2019: 387-394.
[10] XU M, WONG D F, YANG B, et al. Leveraging local and global patterns for self-attention networks[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 3069-3075.
[11] 李云红, 梁思程, 任劼, 等. 基于循环神经网络变体和卷积神经网络的文本分类方法[J]. 西北大学学报 (自然科学版), 2019, 49(4): 573-579.
LI Y H, LIANG S C, REN J, et al. Text classification method based on recurrent neural network variants and convolutional neural network[J]. Journal of Northwest University (Natural Science Edition), 2019, 49(1): 573-579.
[12] 乔百友, 武彤, 杨璐, 等. 一种基于BiGRU和胶囊网络的文本情感分析方法[J/OL]. 吉林大学学报 (工学版): 1-11[2023-06-21]. https://doi.org/10.13229/j.cnki.jdxbgxb20221229.
QIAO B Y, WU T, YANG L, et al. A text sentiment analysis method based on BiGRU and capsule network[J/OL]. Journal of Jilin University (Engineering and Technology Edition): 1-11[2023-06-21]. https://doi.org/10.13229/j.cnki.jdxbgxb20221229.
[13] 孔繁钰, 陈纲. 基于改进双向LSTM的评教文本情感分析[J]. 计算机工程与设计, 2022, 43(12): 3580-3587.
KONG F Y, CHEN G. Sentiment analysis of teaching evaluation text based on improved bidirectional LSTM[J]. Computer Engineering and Design, 2022, 43(12): 3580-3587.
[14] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[J]. arXiv:1409.0473, 2014.
[15] 关鹏飞, 李宝安, 吕学强, 等. 注意力增强的双向LSTM情感分析[J]. 中文信息学报, 2019, 33(2): 105-111.
GUAN P F, LI B A, LV X Q, et al. Attention enhanced bi-directional LSTM for sentiment analysis[J]. Journal of Chinese Information Processing, 2019, 33(2): 105-111.
[16] ZHOU X, WAN X, XIAO J. Attention-based LSTM network for cross-lingual sentiment classification[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016: 247-256.
[17] LIANG B, LIU Q, XU J, et al. Aspect-based sentiment analysis based on multi-attention CNN[J]. Journal of Computer Research and Development, 2017, 54(8): 1724-1735.
[18] 张军, 张丽, 沈凡凡, 等. RoBERTa融合BiLSTM及注意力机制的隐式情感分析[J]. 计算机工程与应用, 2022, 58(23): 142-150.
ZHANG J, ZHANG L, SHEN F F, et al. Implicit sentiment analysis based on RoBERTa fused with BiLSTM and attention mechanism[J]. Computer Engineering and Applications, 2022, 58(23): 142-150.
[19] LIU G, GUO J. Bidirectional LSTM with attention mechanism and convolutional layer for text classification[J]. Neurocomputing, 2019, 337: 325-338.
[20] 李卫疆, 漆芳, 余正涛. 基于多通道特征和自注意力的情感分类方法[J]. 软件学报, 2021, 32(9): 2783-2800.
LI W J, QI F, YU Z T, et al. Sentiment classification method based on multi-channel features and self-attention[J]. Journal of Software, 2021, 32(9): 2783-2800.
[21] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems, 2017.
[22] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[J]. arXiv:1810.04805, 2018.
[23] ZHUANG L, WAYNE L, YA S, et al. A robustly optimized BERT pre-training approach with post-training[C]//Proceedings of the 20th Chinese National Conference on Computational Linguistics, 2021: 1218-1227.
[24] SHAW P, USZKOREIT J, VASWANI A. Self-attention with relative position representations[C]//Proceedings of NAACL-HLT, 2018: 464-468.
[25] SU J, LU Y, PAN S, et al. Roformer: enhanced transformer with rotary position embedding[J]. arXiv:2104.09864, 2021.
[26] SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout: a simple way to prevent neural networks from overfitting[J]. The Journal of Machine Learning Research, 2014, 15(1): 1929-1958.
[27] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[J]. arXiv:1301.3781, 2013.
[28] LAN Z, CHEN M, GOODMAN S, et al. Albert: a lite BERT for self-supervised learning of language representations[J]. arXiv:1909.11942, 2019.