Computer Engineering and Applications ›› 2017, Vol. 53 ›› Issue (7): 141-146.DOI: 10.3778/j.issn.1002-8331.1509-0131

Previous Articles     Next Articles

Spectral clustering based on local density and geodesic distance

ZHANG Tao1, 2, GE Hongwei1, 2, SU Hui1, 2, ZHANG Huanqing2   

  1. 1.Key Laboratory of Advanced Process Control for Light Industry(Jiangnan University), Ministry of Education, Wuxi, Jiangsu 214122, China
    2.School of Internet of Things, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2017-04-01 Published:2017-04-01

基于局部密度和测地距离的谱聚类

张  涛1,2,葛洪伟1,2,苏  辉1,2,张欢庆2   

  1. 1.轻工过程先进控制教育部重点实验室(江南大学),江苏 无锡 214122
    2.江南大学 物联网工程学院,江苏 无锡 214122

Abstract: The traditional geodesic distance calculation method based on [K]-nearest neighbor graph can find the data relationships on manifold. However, if there are adhesive points between different clusters, the similarity based on traditional geodesic distance can not effectively reflect the true relationships between points, which make it difficult to cluster the datasets effectively. Since the similarity based on traditional geodesic distance can not deal with adhesive datasets effectively, spectral clustering based on local density and geodesic distance is proposed. Firstly, the local density of each point is calculated to find the nearest high density point, and choose marginal points and non-marginal points. Then, the edge between the marginal point and its nearest high density point is created, and edges between the [K]-nearest neighbor points and non-marginal points are created as well. Finally, the geodesic distance and similarity can be calculated to perform clustering. Experiments on artificial and UCI datasets show that the proposed algorithm can improve the clustering accuracy when dealing with the adhesive datasets.

Key words: [K]-nearest neighbor graph, geodesic distance, local density, similarity, spectral clustering

摘要: 传统根据[K]-近邻图计算测地距离的方法,虽然能够发现流形分布数据间的相似关系,但是当不同类的点存在粘连关系时,依此计算相似度时不能体现样本间的真实关系,从而无法有效聚类。针对传统测地距离计算相似度的方法不能有效处理粘连数据集的问题,提出了基于局部密度和测地距离的谱聚类方法。计算样本的局部密度,寻找每个样本点的最近高密度点,并选择边缘点和非边缘点;在边缘点和其最近高密度点之间构造边、非边缘点之间的[K]个近邻点构造边,依此计算测地距离和相似度并进行聚类。在人工数据集和UCI数据集上的实验表明,该算法在处理粘连数据集时有效提高了聚类准确率。

关键词: [K]-近邻图, 测地距离, 局部密度, 相似度, 谱聚类