计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (16): 139-141.

• 数据库、信号与信息处理 • 上一篇    下一篇

一种基于网格索引的数据聚类算法

李 筠,宋 凯,姜学军   

  1. 沈阳理工大学 信息科学与工程学院,沈阳 110168
  • 收稿日期:2007-09-10 修回日期:2007-10-13 出版日期:2008-06-01 发布日期:2008-06-01
  • 通讯作者: 李 筠

Data clustering algorithm based on index of gridding

LI Jun,SONG Kai,JIANG Xue-jun   

  1. Information Science and Engineering College,Shenyang Ligong University,Shenyang 110168,China
  • Received:2007-09-10 Revised:2007-10-13 Online:2008-06-01 Published:2008-06-01
  • Contact: LI Jun

摘要: 为了提高基于密度聚类算法的效率,避免算法在执行过程中的多余搜索,提出了一种基于DBSCAN算法的改进的空间数据聚类算法。该算法采用对象邻域空间进行划分的方法,将网格索引结构应用于该算法。在核心对象的邻域内选择八个方向上未标记且距离核心对象最边缘的对象来扩展种子对象,减少查询次数,降低聚类的时间复杂度。在实验中,利用海量数据集对算法进行测试,测试结果证明新算法在保证聚类精度的情况下时间效率显著高于DBSCAN算法。

关键词: DBSCAN, 网格索引, 空间数据, 聚类

Abstract: In order to improve the efficiency of clustering algorithm based on density and avoid redundant search in processing,the paper puts forward an improved spatial data clustering algorithm based on DBSCAN.The algorithm uses the method of object’s neighborhood spatial segmentation,and makes use of index of gridding structure.In core points’ neighborhood,the objects without mark which lie in eight aspects and have the biggest distance from core objects are chose to expand seed objects.In the case,the times of query is decreased,and the time complexity of clustering is reduced.In experiment,mass data is used to test the algorithm,which proves that the new algorithm’s time efficiency is much better than DBSCAN in the same clustering precision.

Key words: Density Based Spatial Clustering of Application with Noise(DBSCAN), index of gridding, spatial data, clustering