计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (3): 196-204.DOI: 10.3778/j.issn.1002-8331.2209-0026

• 模式识别与人工智能 • 上一篇    下一篇

基于情感特征增强的中文隐式情感分类模型

谈光璞,朱广丽,韦斯羽   

  1. 安徽理工大学 计算机科学与工程学院,安徽 淮南 232001
  • 出版日期:2024-02-01 发布日期:2024-02-01

Implicit Sentiment Classification Model Based on Enhancement of Sentiment Features Oriented to Chinese Text

TAN Guangpu, ZHU Guangli, WEI Siyu   

  1. School of Computer Science and Engineering, Anhui University of Science and Technology, Huainan, Anhui 232001, China
  • Online:2024-02-01 Published:2024-02-01

摘要: 隐式情感句子由于缺乏显式情感词并且其语义特征无法深入挖掘,导致现有模型进行情感分类时准确率不高。针对这一现状,提出一种基于情感特征增强的中文隐式情感分类模型(CISC),通过构建积极和消极情感词库,并将情感词进行位置嵌入得到情感特征增强的句子,进而提高分类准确率。对句子进行预处理得到对应的词语序列;通过自注意力机制的情感词检测方法进行句子情感词定位并分别嵌入积极和消极词,借助多层注意力网络得到对应的正向和负向的句子表示;分别将获取到的句子表示通过Bi-GRU模型和交互注意力机制(attention over attention,AOA)提取出对应的语义特征;将语义特征分别通过Softmax进行情感倾向概率计算,通过融入积极词的句子正向情感概率与融入消极词句子的负向情感概率进行均值计算并比较,得到最终的情感倾向。与EBA、GGBA等多种模型在SMP-ECISA2019公开数据集进行了比对实验,实验结果证明,提出的CISC模型可以提高中文隐式情感文本的分类效果。

关键词: 隐式情感, 情感分类, 情感词, 特征增强, 语义特征

Abstract: The semantic features in the implicit sentiment sentence cannot be deeply mined because the lack of explicit sentiment words, which inevitably affects the classification accuracy. To solve the problem, this paper proposes a implicit sentiment classification model based on the enhancement of sentiment features oriented to Chinese text, named CISC. To improve the classification accuracy, the positive and negative sentiment lexicons are constructed, and the sentiment words are embedded into the position to get sentences with an enhancement of the sentiment features. Firstly, the sentences are preprocessed to get the corresponding word sequence. Then, the sentiment words are positioned and embedded with positive and negative words respectively through self-attention. The corresponding positive and negative sentence representations are obtained respectively through hierarchical attention networks. Next, the corresponding sentence representations are input into the Bi-GRU models and AOA to get corresponding feature vectors respectively. Finally, the obtained feature vectors are input into the Softmax to get the sentiment tendency. Further, the positive sentiment probability is calculated for each sentence incorporating positive words. Similarly, the negative sentiment probability can be gotten for each sentence incorporating negative words. So the final sentiment tendency is gotten by comparing the average value between the positive and the negative sentiment probability. Experiments on the SMP-ECISA2019 public dataset show that the proposed model of CISC has a higher classification performance of Chinese implicit sentiment text compared to EBA, GGBA models.

Key words: implicit sentiment, sentiment classification, sentiment word, feature enhancement, semantic feature