计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (21): 116-126.DOI: 10.3778/j.issn.1002-8331.2310-0020

• 模式识别与人工智能 • 上一篇    下一篇

融合双通道特征的中文短文本情感分类模型

臧洁,鲁锦涛,王妍,李翔,廖慧之   

  1. 辽宁大学 信息学院,沈阳 110036
  • 出版日期:2024-11-01 发布日期:2024-10-25

Chinese Short Text Sentiment Classification Model Integrating Dual-Channel Features

ZANG Jie, LU Jintao, WANG Yan, LI Xiang, LIAO Huizhi   

  1. College of Information, Liaoning University, Shenyang 110036, China
  • Online:2024-11-01 Published:2024-10-25

摘要: 中文短文本具有特征稀疏、歧义多、信息不规范、文本情感丰富等特点,现有基于深度学习的中文短文本情感分类模型具有提取文本特征不充分和只注重语义信息而忽视句法信息的问题。针对上述问题提出融合双通道特征的中文短文本情感分类模型。预训练模型得到动态词向量,赋予模型更丰富的语言特征和明确的句法信息。双通道提取动态词向量的文本特征,上侧通道改进了DPCNN网络,提取文本丰富的长距离依赖关系;下侧通道建立双向长短期记忆网络各时间的字词特征和文本特征的多头自注意力关系,学习更加充分的文本特征,对分类结果较为关键的词汇给予更多的关注。将双通道的特征信息拼接获得最终的文本表示。实验结果表明,该分类模型在ChnSentiCorp、微博评论和电商评论数据集的准确率分别能够达到96.54%、92.05%和94.3%,对比模型准确率平均值高2.28、2.44和1.01个百分点。融合双通道特征的中文短文本情感分类模型能有效提高文本分类准确率,为中文短文本情感分类提供了新的理论模型。

关键词: 文本情感分类, 预训练模型, 深度学习, 注意力机制

Abstract: Chinese short texts have the characteristics of sparse features, many ambiguities, non-standard information, and rich text emotions. The existing Chinese short text emotion classification models based on deep learning have the problem of insufficient extraction of text features and only focus on semantic information and ignore syntactic information. In order to solve the above problems, a Chinese short text emotion classification model that integrates dual-channel features is proposed. Firstly, the pre-trained model obtains dynamic word vectors, endowing the model with richer language features and clear syntactic information. Secondly, dual channels extract text features of dynamic word vectors. The upper channel improves the DPCNN network and extracts rich long-distance dependencies of text. The lower channel establishes a bidirectional long short-term memory network with multi-head self attention relationships between word features and text features at each times, learning more sufficient text features and paying more attention to vocabulary with more critical classification results. Finally, the feature information of the dual channels are concatenated to obtain the final text representation.Experimental results show that the accuracy of this classification model can reach 96.54%, 92.05% and 94.3% in the ChnSentiCorp, Weibo review and e?commerce review datasets respectively, which is 2.28, 2.44 and 1.01 percentage points higher than the average accuracy of the comparison model.The Chinese short text sentiment classification model that integrates dual-channel features can effectively improve the accuracy of text classification and provide a new theoretical model for Chinese short text sentiment classification.

Key words: text sentiment classification, pre-training model, deep learning, attention mechanism