Chinese Short Text Sentiment Classification Model Integrating Dual-Channel Features

doi:10.3778/j.issn.1002-8331.2310-0020

Abstract

Abstract: Chinese short texts have the characteristics of sparse features, many ambiguities, non-standard information, and rich text emotions. The existing Chinese short text emotion classification models based on deep learning have the problem of insufficient extraction of text features and only focus on semantic information and ignore syntactic information. In order to solve the above problems, a Chinese short text emotion classification model that integrates dual-channel features is proposed. Firstly, the pre-trained model obtains dynamic word vectors, endowing the model with richer language features and clear syntactic information. Secondly, dual channels extract text features of dynamic word vectors. The upper channel improves the DPCNN network and extracts rich long-distance dependencies of text. The lower channel establishes a bidirectional long short-term memory network with multi-head self attention relationships between word features and text features at each times, learning more sufficient text features and paying more attention to vocabulary with more critical classification results. Finally, the feature information of the dual channels are concatenated to obtain the final text representation.Experimental results show that the accuracy of this classification model can reach 96.54%, 92.05% and 94.3% in the ChnSentiCorp, Weibo review and e?commerce review datasets respectively, which is 2.28, 2.44 and 1.01 percentage points higher than the average accuracy of the comparison model.The Chinese short text sentiment classification model that integrates dual-channel features can effectively improve the accuracy of text classification and provide a new theoretical model for Chinese short text sentiment classification.

Key words: text sentiment classification, pre-training model, deep learning, attention mechanism

摘要： 中文短文本具有特征稀疏、歧义多、信息不规范、文本情感丰富等特点，现有基于深度学习的中文短文本情感分类模型具有提取文本特征不充分和只注重语义信息而忽视句法信息的问题。针对上述问题提出融合双通道特征的中文短文本情感分类模型。预训练模型得到动态词向量，赋予模型更丰富的语言特征和明确的句法信息。双通道提取动态词向量的文本特征，上侧通道改进了DPCNN网络，提取文本丰富的长距离依赖关系；下侧通道建立双向长短期记忆网络各时间的字词特征和文本特征的多头自注意力关系，学习更加充分的文本特征，对分类结果较为关键的词汇给予更多的关注。将双通道的特征信息拼接获得最终的文本表示。实验结果表明，该分类模型在ChnSentiCorp、微博评论和电商评论数据集的准确率分别能够达到96.54%、92.05%和94.3%，对比模型准确率平均值高2.28、2.44和1.01个百分点。融合双通道特征的中文短文本情感分类模型能有效提高文本分类准确率，为中文短文本情感分类提供了新的理论模型。

关键词: 文本情感分类, 预训练模型, 深度学习, 注意力机制

ZANG Jie, LU Jintao, WANG Yan, LI Xiang, LIAO Huizhi. Chinese Short Text Sentiment Classification Model Integrating Dual-Channel Features[J]. Computer Engineering and Applications, 2024, 60(21): 116-126.

臧洁, 鲁锦涛, 王妍, 李翔, 廖慧之. 融合双通道特征的中文短文本情感分类模型[J]. 计算机工程与应用, 2024, 60(21): 116-126.

References

[1] 李彤, 申俊楠. 突发事件网络舆情的演进规律及应用研究[J]. 信息与管理研究, 2018, 3(1): 88-95.
LI T, SHEN J N. A research on evolution rule and its application in online public opinion in emergency[J]. Journal of Information and Management, 2018, 3(1): 88-95.
[2] HU R, RUI L, ZENG P, et al. Text sentiment analysis: a review[C]//Proceedings of the 2018 IEEE 4th International Conference on Computer and Communications (ICCC), Chengdu, Dec 7-10, 2018: 2283-2288.
[3] WANG Z Y, CHENG J P, WANG H X. Short text understanding: a survey[J]. Journal of Computer Research and Development, 2016, 53(2): 262-269.
[4] 王浩畅, 孙铭泽. 基于ERNIE-RCNN模型的中文短文本分类[J]. 计算机技术与发展, 2022, 32(6): 28-34.
WANG H C, SUN M Z. Chinese short text classification based on ERNIE-RCNN model[J]. Computer Technology and Development, 2022, 32(6): 28-34.
[5] 郝婷, 王薇. 融合Bert和BiLSTM的中文短文本分类研究[J]. 软件工程, 2023, 26(3): 58-63.
HAO T, WANG W. Research on Chinese short text classification based on bert and Bi-LSTM[J]. Software Engineering, 2023, 26(3): 58-63.
[6] 李芸, 潘雅丽, 肖冬. 基于改进BERT-BiGRU模型的文本情感分类研究[J]. 电子技术应用, 2023, 49(2): 9-14.
LI Y, PAN Y L, XIAO D. Research on text emotion classification based on improved BERT-BiGRU model [J]. Application of Electronic Technique, 2023, 49(2): 9-14.
[7] HASSAN A, MAHMOOD A. Convolutional recurrent deep learning model for sentence classification[J]. IEEE Access, 2018, 6:13949-13957.
[8] BATBAATAR E, LI M, RYU K H. Semantic-emotion neural network for emotion recognition from text[J]. IEEE Access, 2019, 7:111866-111878.
[9] JOHNSON R, ZHANG T. Deep pyramid convolutional neural networks for text categorization[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Jul 31-Aug 4, 2017: 562-570.
[10] 王勇, 何养明, 邹辉, 等. WordNG-Vec:一种应用于CNN 文本分类的词向量模型[J]. 小型微型计算机系统, 2019, 40(3): 499-502.
WANG Y, HE Y M, ZOU H, et al. WordNG-Vec: a word vector model applied to CNN text classification[J]. Journal of Chinese Computer Systems, 2019, 40(3): 499-502.
[11] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems, 2017: 5998-6008.
[12] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Minnesota, Jun 2-7, 2019: 4171-4186.
[13] SHEN Z Q, JU T. Research on tendency analysis of micro blog comments based on BERT and BISTM[J]. Information Studies: Theory 8. Application, 2020, 43(8): 173-177.
[14] WANG Q Y, ZHU G L, ZHANG S X, et al. Extending emotional lexicon for improving the classification accuracy of Chinese film reviews[J]. Connection Science, 2021, 33(2): 153-172.
[15] CAO Z X, ZHOU Y M, YANG A M, et al. Deep transfer learning mechanism for fine grained cross-domain sentiment classification[J]. Connection Science, 2021, 33(2): 911-928.
[16] ASGHAR M Z, SUBHAN F, LMRAN M, et al. Performance evaluation of supervised machine learning techniques for efficient detection of emotions from online content[J]. CMC-Computers Materials & Continua, 2019, 63(3): 1093-1118.
[17] AURANGZEB K, AYUB N, ALHUSSEIN M. Aspect based multi-labeling using SVM based ensembler[J]. IEEE Access, 2021, 9: 26026-26040.
[18] ATMAJA B T, AKAGI M, Two stage dimensional emotion recognition by fusing predictions of acoustic and text networks using SVM[J]. Speech Communication, 2021, 126: 9-21.
[19] YOU Y. HE Y, RAJBHANDARI S, et al. Fast LSTM by dynamic decomposition on cloud and distributed systems[J]. Knowledge and Information Systems, 2020, 62(11): 4169-4197.
[20] WU S T, LIU Y L, ZOU Z R, et al. BILSTM: stock price prediction based on multiple data sources and sentiment analysis[J]. Connection Science, 2022, 34(1): 44-62.
[21] ZHAO J, DALIN Z, XIAO, Y, et al. User personality prediction based on topic preference and sentiment analysis using LSTM model[J]. Pattern Recognition Letters, 2020, 138: 397-402.
[22] LIN Z, WANG L, CUI X, et al. Fast sentiment analysis algorithm based on double mode fusion[J]. Computer Systems Science and Engineering, 2021, 36(1): 175-188.
[23] BAZIOTIS C, PELEKIS N, DOULKERIDIS C. Deep LSTM with attention for message-level and topic-based sentiment analysis[C]//Proceedings of the 11th International Workshop on Semantic Evaluation, Vancouver, Aug 3-4, 2017: 747-754.
[24] 黄山成, 韩东红, 乔百友, 等. 融合ERNIE2.0-BiLSTM-Attention的隐式情感分析方法[J]小型微型计算机系统, 2021, 42(12): 2485-2490.
HUANG S C, HAN D H, QIAO B Y, et al. Implicit sentiment analysis method based on ERNIE2.0-BiLSTM-Attention[J]. Journal of Chinese Computer Systems, 2021, 42(12): 2485-2490.
[25] 周宁, 钟娜, 靳高雅, 等. 基于混合词嵌入的双通道注意力网络中文文本情感分析[J]. 数据分析与知识发现, 2023, 7(3): 58-68.
ZHOU N, ZHONG N, JIN G Y, et al. Chinese text sentiment analysis based on dual channel attention network with hybrid word embedding[J]. Data Analysis and Knowledge Discovery, 2023, 7(3): 58-68.
[26] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[J]. arXiv:1409.0473, 2014.
[27] CHENG S Y, GUO Z Y, LIU W, et al. Research on multi-granularity sentence interaction natural language inference based on attention mechanism[J]. Journal of Chinese Computer Systems, 2019, 40(6): 1215-1220.
[28] CUI Y M, CHE W X, LIU T, et al. Pre-training with whole word masking for chinese bert[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 3504-3514.
[29] ZHANG S X, YU H B, ZHU G L. An emotional classification method of Chinese short comment text based on ELECTRA[J]. Connection Science, 2022, 34(1): 254-273.
[30] CUI Y M, CHE W X, LIU T, et al. LERT: a linguistically-motivated pre-trained language model[EB/OL].(2022-11-11) [2024-03-20]. https://arxiv.org/abs/2211.05344.