计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (18): 165-170.DOI: 10.3778/j.issn.1002-8331.2001-0148

• 模式识别与人工智能 • 上一篇    下一篇

结合LDA与Self-Attention的短文本情感分类方法

陈欢,黄勃,朱翌民,俞雷,余宇新   

  1. 1.上海工程技术大学 电子电气工程学院,上海 201620
    2.江西省经济犯罪侦查与防控技术协同创新中心,南昌 330103
    3.上海外国语大学 国际金融贸易学院,上海 201620
  • 出版日期:2020-09-15 发布日期:2020-09-10

Short Text Emotion Classification Method Combining LDA and Self-Attention

CHEN Huan, HUANG Bo, ZHU Yimin, YU Lei, YU Yuxin   

  1. 1.School of Electrical and Electronic Engineering, Shanghai University of Engineering and Technology, Shanghai 201620, China
    2.Jiangxi Collaborative Innovation Center for Economic Crime Detection and Prevention and Control, Nanchang 330103, China
    3.School of Economics and Finance, Shanghai International Studies University, Shanghai 201620, China
  • Online:2020-09-15 Published:2020-09-10

摘要:

在对短文本进行情感分类任务的过程中,由于文本长度过短导致数据稀疏,降低了分类任务的准确率。针对这个问题,提出了一种基于潜在狄利克雷分布(LDA)与Self-Attention的短文本情感分类方法。使用LDA获得每个评论的主题词分布作为该条评论信息的扩展,将扩展信息和原评论文本一起输入到word2vec模型,进行词向量训练,使得该评论文本在高维向量空间实现同一主题的聚类,使用Self-Attention进行动态权重分配并进行分类。通过在谭松波酒店评论数据集上的实验表明,该算法与当前主流的短文本分类情感算法相比,有效地提高了分类性能。

关键词: 主题词, 短文本, Self-Attention, 潜在狄利克雷分布(LDA), word2vec

Abstract:

In the process of the short text emotional classification tasks, the data is sparse due to the short text length, which reduces the accuracy of classification tasks. To solve this problem, this paper proposes a short text emotional classification method based on Latent Dirichlet Allocation(LDA) and Self-Attention. LDA is used to obtain the topic word distribution of each comment as the extension of the comment information. The extended information and the original comment text are input into word2vec model to train the word vector, so that the comment text can cluster the same topic in high-dimensional vector space. Self-Attention is used for dynamic weight allocation and classification. The experiment on Tan Songbo hotel review data set shows that the algorithm in this paper improves the classification performance effectively compared with the current mainstream short text emotional classification algorithm.

Key words: topic word, short text, Self-Attention, Latent Dirichlet Allocation(LDA), word2vec