计算机工程与应用 ›› 2007, Vol. 43 ›› Issue (5): 178-181.

• 数据库与信息处理 • 上一篇    下一篇

一种基于术语簇和关联规则的文档聚类方法

徐建民 成岳鹏 辛丽军   

  1. 河北大学工商学院;保定 河北大学数学与计算机学院
  • 收稿日期:2006-03-10 修回日期:1900-01-01 出版日期:2007-02-11 发布日期:2007-02-11
  • 通讯作者: 成岳鹏

A Document Clustering Approach Based on Term Clustering and Association Rules

YuePeng Cheng   

  • Received:2006-03-10 Revised:1900-01-01 Online:2007-02-11 Published:2007-02-11
  • Contact: YuePeng Cheng

摘要: 提出一种新的基于术语簇和关联规则的文档聚类方法。首先对文档集合进行分词,根据术语之间的平均互信息形成术语簇,用术语簇来表示文档矢量空间模型,使用关联规则挖掘文档的初始聚类,对此进行聚类分析获得最终的文档聚类。实验结果表明,与传统的聚类方法相比,其运行速度快,聚类效果和聚类质量都有明显提高。

Abstract: This paper proposes a new document clustering approach based on term clustering and association rules. In this approach, firstly we extract words from document collection, then construct term clustering according to AMI(Average Mutual Information) between terms, the document VSM(Vector Space Model) is represented by term clustering,then we use association rules to min the initial document clustering, finally we do the clustering analysis to get the final document clustering. The experiment results show that the performance and clustering quality of this approach are obviously improved than those of traditional methods in the document clustering process.