计算机工程与应用 ›› 2013, Vol. 49 ›› Issue (2): 188-193.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

使用多元语义特征的评论文本主题聚类

李亚红1,王素格1,2,李德玉1,2   

  1. 1.山西大学 计算机与信息技术学院,太原 030006
    2.山西大学 计算机与信息技术学院,计算智能与中文信息处理教育部重点实验室,太原 030006
  • 出版日期:2013-01-15 发布日期:2013-01-16

Exploiting multiple semantic features for comment text topic clustering

LI Yahong1, WANG Suge1,2, LI Deyu1,2   

  1. 1.School of Computer & Information Technology, Shanxi University, Taiyuan 030006, China
    2.School of Computer & Information Technology, Key Lab of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006, China
  • Online:2013-01-15 Published:2013-01-16

摘要: 特征是一切观点挖掘和情感分析任务的关键所在。对于无监督的文本聚类任务,文本特征的优劣直接影响聚类效果。考察三种语义特征(名词、名词短语、语义角色)对主题聚类的作用以及不同特征之间的相容关系,提出一种消除冗余特征的方法。该方法能有效地去除冗余特征,提高聚类精度。同时还提出一种基于语义角色标注的直接定位有效词特征的聚类方法,实验表明该方法是直接的和有效的,并为特征选择方法提供了新思路。

关键词: 文本主题聚类, 名词特征, 短语特征, 语义角色特征, 相容关系

Abstract: The feature is a key to the tasks of emotional analysis and opinion mining. Particularly for unsupervised text clustering task, the text feature quality directly affects the clustering results. This paper studies three kinds of semantic features, namely nouns features, noun phrase features, semantic role features and their role on the text topic clustering. And considering the compatibility between the different features, a method is proposed to eliminate redundant features. The method can effectively remove redundant features to improve the clustering accuracy. Also another method is proposed based on semantic role labeling to directly and effectively locate word features for topic clustering. The experimental results indicate that the method is direct and effective, and a new approach to feature selection method is provided.

Key words: text topic clustering, nouns features, nouns phrase features, semantic role features, compatibility relation