计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (7): 124-130.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

面向文本情感聚类的维度判别方法

李  欣1,王素格1,2,李德玉1,2   

  1. 1.山西大学 计算机与信息技术学院,太原 030006
    2.山西大学 计算智能与中文信息处理教育部重点实验室,太原 030006
  • 出版日期:2015-04-01 发布日期:2015-03-31

Dimension identification method for text sentiment clustering

LI Xin1, WANG Suge1,2, LI Deyu1,2   

  1. 1.School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China
    2.Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006, China
  • Online:2015-04-01 Published:2015-03-31

摘要: 在文本情感分析时,使用有监督的机器学习方法的不足是需要大量的带标签的文本数据,而无监督的文本聚类方法可以克服这一问题。对于文本情感聚类,在节省数据资源的同时,也存在聚类结果的不确定性问题。给出了情感维度的形式化描述,并将观点词识别技术应用于情感维度的判别中。在此基础上,利用获得的情感维度,对评论文本进行情感聚类,有效地解决情感聚类结果的不确定性问题。在4个领域的英文产品评论数据上进行实验,结果表明该方法在自动识别情感聚类维度中是有效的,并得到了满意的情感聚类结果。

关键词: 观点词识别, 维度判别, 文本情感聚类

Abstract: In text sentiment analysis, the shortcoming of supervised machine learning methods is the large demand of labeled text dataset, while text clustering without supervision can overcome this problem. While saving data resource, sentiment clustering leads to another problem, which is the ambiguity sentiment result. This paper presents the formal description of the sentiment dimension, and uses the technology of opinion recognition for the discrimination of sentiment dimension. On this basis, by using acquired sentiment dimension, the sentiment of product review is clustered to effectively solve the uncertain problem of ambiguity sentiment result. The experimental results on 4 domains from Amazon online shopping reviews corpora show that proposed method is effective in the automatic identification of emotional dimension clustering, and gets satisfied results in the text emotional clustering.

Key words: opinion word identification, dimension identification, sentiment-based text clustering