Computer Engineering and Applications ›› 2010, Vol. 46 ›› Issue (13): 53-55.DOI: 10.3778/j.issn.1002-8331.2010.13.016

• 研究、探讨 • Previous Articles     Next Articles

Research of text clustering based on fuzzy granular computing

ZHANG Xia1,2,YIN Yi-xin1,YU Hai-yan1,2,ZHAO Hai-long1   

  1. 1.School of Information Engineering,University of Science and Technology Beijing,Beijing 100083,China
    2.Computer Center,Hebei University of Economics and Business,Shijiazhuang 050061,China
  • Received:2009-03-06 Revised:2009-04-21 Online:2010-05-01 Published:2010-05-01
  • Contact: ZHANG Xia

基于模糊粒度计算的文本聚类研究

张 霞1,2,尹怡欣1,于海燕1,2,赵海龙1   

  1. 1.北京科技大学 信息工程学院,北京 100083
    2.河北经贸大学 计算机中心,石家庄 050061
  • 通讯作者: 张 霞

Abstract: The typical algorithm of text clustering is a“Hard Partition” one.Actually,Chinese text is better to treat with “Soft Partition” for its diversity and largeness.The fuzzy-set theory supplies a powerful analyzing tool to this “Soft partition”.Traditional fuzzy text clustering methods mostly get the fuzzy equivalent matrix or fuzzy division by iterating the matrix of membership degree.Huge storage space is necessary for that process.The text clustering based on fuzzy granular computing will work as:First a normalized distance function ddidj) in the fuzzy granularity space of text set is provided,and then the function is used to do a dynamic clustering work to text who has a less distance than granularity dλ.Approved by the test,this method has such advantages on reducing the computing complexity and space complexity,suitable for the status that many samples need to be processed.

摘要: 典型的文本聚类算法是一种硬划分,但是实际上由于中文文本的多样性和大量性更适合进行软划分,模糊集理论的提出为这种软划分提供了有力的分析工具。传统的模糊聚类方法大都是通过对隶属度的矩阵逐步迭代得到模糊等价矩阵或模糊划分的方法实现聚类,这个过程需要大量的存储空间。基于模糊粒度计算的文本聚类算法是在文档集合的模糊粒度空间上给定一个归一化的距离函数ddidj),对距离小于粒度dλ的文本进行动态聚类。通过实验证明此方法在解决文本聚类问题时具有降低计算复杂度和空间复杂度,适于大量文本的聚类处理。

CLC Number: