Computer Engineering and Applications ›› 2007, Vol. 43 ›› Issue (7): 198-201.

• 数据库与信息处理 • Previous Articles     Next Articles

Similarity-based Grid Clustering Algorithm

  

  • Received:2006-03-30 Revised:1900-01-01 Online:2007-03-01 Published:2007-03-01

基于相似度的网格聚类算法

刘敏娟 柴玉梅 张西芝   

  1. 郑州大学信息工程学院 郑州大学信息工程学院
  • 通讯作者: 刘敏娟

Abstract: This paper presents a similarity-based grid clustering algorithm (SGCA). The SGCA removes some outliers or noises in the dataset by the technique of grid and disposes of border points of clusters by the method of the threshold function of border points. The SGCA clusters by the method of similarity. Scanning the dataset only once, the SGCA can discover clusters of arbitrary shapes. The experiment results show that it can discover outliers or noises effectively and get good cluster quality. The SGCA is not only suitable for some synthetic datasets, but also it has better clustering results in some high dimensional datasets. In order to improve the efficiency of SGCA, the technique of grid cores-based is used in this paper.

摘要: 提出了一种基于相似度的网格聚类算法(SGCA)。该算法主要利用网格技术去除数据集中的部分孤立点或噪声,使用边界点阈值函数提取类的边界点,最后利用相似度方法进行聚类。SGCA算法只要求对数据集进行一遍扫描。实验表明,该算法可扩展性好,能处理任意形状和大小的聚类,能够很好的识别出孤立点或噪声,它不仅适用于综合数据集,而且对高维数据集也具有较好的聚类结果。本文中还引进了网格核技术,进一步改善了SGCA算法的时间复杂度。