计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (12): 170-176.DOI: 10.3778/j.issn.1002-8331.2011-0056

• 模式识别与人工智能 • 上一篇    下一篇

基于ALBERT-AFSFN的中文短文本情感分析

叶星鑫,徐杨,罗梦诗   

  1. 1.贵州大学 大数据与信息工程学院,贵阳 550025
    2.贵阳铝镁设计研究院有限公司,贵阳 550009
  • 出版日期:2022-06-15 发布日期:2022-06-15

ALBERT-AFSFN Based Sentiment Analysis of Chinese Short Text

YE Xingxin, XU Yang, LUO Mengshi   

  1. 1.College of Big Data and Information Engineering, Guizhou University, Guiyang 550025, China
    2.Guiyang Aluminum Magnesium Design & Research Institute Co., Ltd., Guiyang 550009, China
  • Online:2022-06-15 Published:2022-06-15

摘要: 针对传统的卷积神经网络未能充分利用不同通道间的文本特征语义信息和关联信息,以及传统的词向量表示方法采用静态方式对文本信息进行提取,忽略了文本的位置信息,从而导致文本情感分类不准确的问题,提出了一种结合ALBERT(a lite BERT)和注意力特征分割融合网络(attention feature split fusion network,AFSFN)的中文短文本情感分类模型ALBERT-AFSFN。该模型利用ALBERT对文本进行词向量表示,提升词向量的表征能力;通过注意力特征分割融合网络将特征分割为两组,对两组不同通道的特征进行提取和融合,最大程度保留不同通道之间的语义关联信息;借助Softmax函数对中文短文本情感进行分类,得到文本的情感倾向。在三个公开数据集Chnsenticorp、waimai-10k和weibo-100k上的准确率分别达到了93.33%、88.98%和97.81%,F1值也分别达到了93.23%、88.47%和97.78%,结果表明提出的方法在中文短文本情感分析中能够达到更好的分类效果。

关键词: ALBERT, 分割注意力, 特征融合, 情感分析

Abstract: The traditional convolution neural network can not make full use of the semantic information and association information of text features between different channels, and the traditional word vector representation method uses static method to extract text information, ignoring the location information of the text, which leads to inaccurate text sentiment classification, the Chinese short text sentiment classification model ALBERT-AFSAN, which is based on the combination of the attention feature split fusion network and the best, is proposed. Firstly, the model uses ALBERT to represent the word vector of the text to improve the representation ability of the word vector. Secondly, the feature is divided into two groups through the attention feature split fusion network, and then the features of the two groups of different channels are extracted and fused to retain the semantic association information between different channels to the greatest extent. Finally, the sentiment of Chinese short text is classified by softmax function, getting the emotional tendency of the text. The results show that the accuracy of the proposed method is 93.33%, 88.98% and 97.81% respectively on the three public datasets containing chnenticorp, waimai-10k and weibo-100k, and the F1 values are 93.23%, 88.47% and 97.78%, respectively. The results show that the proposed method can achieve better classification effect in sentiment analysis of Chinese short text.

Key words: ALBERT, split attention, feature fusion, sentiment analysis