计算机工程与应用 ›› 2007, Vol. 43 ›› Issue (29): 172-175.

• 数据库与信息处理 • 上一篇    下一篇

一种基于本体的抽象度可调文档聚类

王晓东,郭 雷,方 俊,杨 宁,邓 涛   

  1. 西北工业大学 自动化学院,西安 710072
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-10-11 发布日期:2007-10-11
  • 通讯作者: 王晓东

Ontology-based adjustable text clustering using abstract degree of concept

WANG Xiao-dong,GUO Lei,FANG Jun,YANG Ning,DENG Tao   

  1. College of Automation,Northwestern Polytechnical University,Xi’an 710072,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-10-11 Published:2007-10-11
  • Contact: WANG Xiao-dong

摘要: 文档聚类随着网上文本数量的激增以及实际应用中的需求,引起了人们广泛的关注。针对目前文档聚类的主要缺陷,提出了一种新的基于本体的抽象度可调文档聚类(Adjustable Text Clustering using Abstract Degree of Concept,ATCADC)。该方法采用Wordnet对VSM特征词进行概念映射和消歧处理,利用生成的特征概念实现文档语义层面上的矢量描述,并在二次特征选择的基础上,完成合成聚类(AHC)。方法能够依据用户设定的概念抽象度,借助专门设计的语义中心矢量调节聚类,还可利用关键特征概念对聚类簇进行解释。实验结果证明,聚类精度高,聚类簇可解释,调节效果有效,能够满足用户不同概念抽象度层次上的聚类。

关键词: 本体, 文档聚类, 概念, 消歧, 抽象度

Abstract: This paper suggests a novel text clustering scheme based on ontology,in order to overcome the shortcomings of traditional measures.The scheme firstly translates the feature terms of VSM into concepts by mapping and disambiguation with the aid of the ontology,Wordnet.Using the improved feature vector of VSM,the scheme can work in the semantic level.Subsequently,on the basis of twice feature selections,which can sharply decrease the complicacy of clustering,agglomerative clustering algorithm is applied.With the special semantic center vectors,scheme can adjust clustering by the abstract degree of concept for different requirements of users.In the experiment,the result shows the accuracy of this scheme is high,clusters can be explained by feature concepts,and adjustment is valid.

Key words: ontology, text clustering, concept, disambiguation, abstract degree