计算机工程与应用 ›› 2007, Vol. 43 ›› Issue (7): 198-201.
• 数据库与信息处理 • 上一篇 下一篇
刘敏娟 柴玉梅 张西芝
收稿日期:
修回日期:
出版日期:
发布日期:
通讯作者:
Received:
Revised:
Online:
Published:
摘要: 提出了一种基于相似度的网格聚类算法(SGCA)。该算法主要利用网格技术去除数据集中的部分孤立点或噪声,使用边界点阈值函数提取类的边界点,最后利用相似度方法进行聚类。SGCA算法只要求对数据集进行一遍扫描。实验表明,该算法可扩展性好,能处理任意形状和大小的聚类,能够很好的识别出孤立点或噪声,它不仅适用于综合数据集,而且对高维数据集也具有较好的聚类结果。本文中还引进了网格核技术,进一步改善了SGCA算法的时间复杂度。
Abstract: This paper presents a similarity-based grid clustering algorithm (SGCA). The SGCA removes some outliers or noises in the dataset by the technique of grid and disposes of border points of clusters by the method of the threshold function of border points. The SGCA clusters by the method of similarity. Scanning the dataset only once, the SGCA can discover clusters of arbitrary shapes. The experiment results show that it can discover outliers or noises effectively and get good cluster quality. The SGCA is not only suitable for some synthetic datasets, but also it has better clustering results in some high dimensional datasets. In order to improve the efficiency of SGCA, the technique of grid cores-based is used in this paper.
刘敏娟 柴玉梅 张西芝. 基于相似度的网格聚类算法[J]. 计算机工程与应用, 2007, 43(7): 198-201.
0 / 推荐
导出引用管理器 EndNote|Ris|BibTeX
链接本文: http://cea.ceaj.org/CN/
http://cea.ceaj.org/CN/Y2007/V43/I7/198