计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (23): 143-146.DOI: 10.3778/j.issn.1002-8331.2008.23.044

• 数据库、信号与信息处理 • 上一篇    下一篇

基于网格和密度的簇边缘精度增强聚类算法

单世民,张 宁,江 贺,张宪超   

  1. 大连理工大学 软件学院,辽宁 大连 116621
  • 收稿日期:2007-10-18 修回日期:2008-01-24 出版日期:2008-08-11 发布日期:2008-08-11
  • 通讯作者: 单世民

GDCAP:grid and density based clustering algorithm with pricise cluster boundaries

SHAN Shi-min,ZHANG Ning,JIANG He,ZHANG Xian-chao   

  1. School of Software,Dalian University of Technology,Dalian,Liaoning 116621,China
  • Received:2007-10-18 Revised:2008-01-24 Online:2008-08-11 Published:2008-08-11
  • Contact: SHAN Shi-min

摘要: 现有的基于网格聚类算法在付出较小的时间复杂度的同时,牺牲了聚类的质量,得到的往往并不是最理想的聚类结果,尤其是在簇边缘可能出现数据点聚类不准现象。提出了一种将网格化空间中位于簇边缘的网格进行精度进一步细化处理的算法,将这些边缘网格中的这些不确定的点重新恢复它们的固有信息,再利用相似度函数将它们分配到合适的簇中。在空间数据集上实验数据表明,这种簇边缘精度增强聚类算法可在O(n)时间内得到优于CLIQUE算法的聚类结果。

关键词: 数据聚类, 基于网格, 基于密度, 混合算法

Abstract: Currently,existing grid based clustering algorithms are more efficient in time complexity,but the cluster quality is not satisfied,especially at the cluster boundaries.In this paper,the authors propose an efficient algorithm to solve this kind of problem,which broke the cells around the clusters’ boundaries and considered the data points in these cells to be processed,then the authors use a similarity function to assign them into the corresponding cells.Experimental evaluation shows that this method is more efficient than CLIQUE and has the time complexity within O(n).

Key words: data clustering, grid-based, density-based, hybrid algorithm