计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (18): 61-66.DOI: 10.3778/j.issn.1002-8331.1806-0356

• 理论与研发 • 上一篇    下一篇

利用区域划分的多密度快速聚类算法

牛少章,欧毓毅,凌捷,顾国生   

  1. 广东工业大学 计算机学院,广州 510006
  • 出版日期:2019-09-15 发布日期:2019-09-11

Multi-Density Fast Clustering Algorithm Based on Region Partition

NIU Shaozhang, OU Yuyi, LING Jie, GU Guosheng   

  1. Faculty of Computer, Guangdong University of Technology, Guangzhou 510006, China
  • Online:2019-09-15 Published:2019-09-11

摘要: 针对基于网格的聚类算法存在簇边缘网格中包含噪声点、利用网格相对密度差进行网格合并时不能区分密度均匀变化的网格等问题。提出一种利用区域划分的多密度快速聚类算法MFCBR。算法把数据空间划分成密度不同的网格,利用网格索引表和网格中心密度差合并网格形成簇,然后分别计算每个簇的边界网格质心、边界网格和最近簇网格中心位置,利用三者之间的关系来排除簇边界网格数据中包含的噪声点。实验表明,该算法在降低噪声数据对聚类干扰的同时,且对密度均匀变化的多密度数据集也有较优的处理效果。

关键词: 区域划分, 网格, 质心, 多密度, 聚类

Abstract: For the problems that the grid based clustering algorithm exists noise points in cluster edge grid and can’t distinguish grid with uniform density when grids are combined with relative density difference. Multi-density fast clustering algorithm based on region partition called MFCBR is proposed. The algorithm divides the data space into grids of different density and uses grid index table and grid center density difference to merge grids, then the center of mass of boundary grid, boundary grid and nearest cluster grid center are computed, the relationship between the three is used to exclude the noise points contained in cluster boundary grid data. Experiments show that the algorithm can reduce the clustering interference of noise data, and has better effect on the uniform density of multi-density data set.

Key words: region division, grid, centroid, multi-density, clustering