Computer Engineering and Applications ›› 2009, Vol. 45 ›› Issue (15): 166-168.DOI: 10.3778/j.issn.1002-8331.2009.15.048

• 数据库、信号与信息处理 • Previous Articles     Next Articles

Easy and efficient algorithm to determine number of clusters

ZHANG Zhong-ping,WANG Ai-jie,CHAI Xu-guang

  

  1. College of Information Science and Engineering,Yanshan University,Qinhuangdao,Hebei 066004,China
  • Received:2008-03-24 Revised:2008-06-16 Online:2009-05-21 Published:2009-05-21
  • Contact: ZHANG Zhong-ping

简单有效的确定聚类数目算法

张忠平,王爱杰,柴旭光   

  1. 燕山大学 信息科学与工程学院,河北 秦皇岛 066004
  • 通讯作者: 张忠平

Abstract: Many clustering algorithm request users to identify the number of clusters before cluster data.This is very difficult for users.In this paper,clusters which are bigger than intra similarity threshold value are split repeatedly.At last,the clusters which are smaller than inter similarity threshold value are merged to have the final number of clusters.Experiments show that the number of clusters identified by the algorithm is equal to the natural number of clusters,and the intra similarity is high,the inter similarity is low,so the algorithm is easy and efficient.

摘要: 很多聚类算法要求用户在聚类之前给出聚类数目,这给用户带来了很大的困难。利用二分思想递归分裂簇内相似度大于给定阈值的簇,最后合并簇间相似度小于给定阈值的簇,来获得最终聚类数目。实验表明提出的算法确定的聚类数目和实际聚类数目相同,并且簇内数据的相似性高,簇间数据的相似性低,该算法简单高效。