计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (18): 184-187.DOI: 10.3778/j.issn.1002-8331.1705-0401

• 模式识别与人工智能 • 上一篇    下一篇

一种基于共享近邻亲和度的聚类算法

邱保志,辛  杭   

  1. 郑州大学 信息工程学院,郑州 450001
  • 出版日期:2018-09-15 发布日期:2018-10-16

Shared nearest neighbor affinity based clustering algorithm

QIU Baozhi, XIN Hang   

  1. School of Information Engineering, Zhengzhou University, Zhengzhou 450001, China
  • Online:2018-09-15 Published:2018-10-16

摘要: 为解决密度聚类算法在处理高维和多密度数据集时聚类结果不精确的问题,提出一种基于共享近邻亲和度(SNNA)的聚类算法。该算法引入[k]近邻和共享近邻,定义共享近邻亲和度作为对象的局部密度度量。算法首先根据亲和度来提取核心点,然后利用广度优先搜索算法对核心点进行聚类,最后对非核心点进行指派即完成整个数据集的聚类。实验结果表明,该算法能够发现任意形状、大小、密度的聚类;与同类算法相比,SNNA算法在处理高维数据时具有较高的聚类准确率。

关键词: 聚类, 密度, 共享近邻, 亲和度, 数据挖掘

Abstract: In order to solve the problem of inaccurate clustering results when dealing with high-dimensional and multi-density datasets, a Shared Nearest Neighbor Affinity(SNNA) based clustering algorithm is put forward. The algorithm incorporates [k] nearest neighbor and shared nearest neighbor, and defines shared neighbor affinity as the local density measure of the object. The algorithm firstly extracts the core points according to the affinity, then uses the breadth first search algorithm to cluster the core points, and finally assigns the non-core points to the right cluster to complete the clustering of the whole data set. Experimental results show that the algorithm can find clusters of arbitrary shape, size and density. Compared with other similar algorithms, SNNA has higher clustering accuracy when dealing with high-dimensional data.

Key words: clustering, density, shared nearest neighbor, affinity, data mining