Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (23): 200-210.DOI: 10.3778/j.issn.1002-8331.2107-0529

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Natural Neighbor Density Extremum Clustering Algorithm

ZHANG Zhonglin, ZHAO Yu, YAN Guanghui   

  1. School of Electronics and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730000, China
  • Online:2021-12-01 Published:2021-12-02

自然邻居密度极值聚类算法

张忠林,赵昱,闫光辉   

  1. 兰州交通大学 电子与信息工程学院,兰州 730000

Abstract:

The density peak algorithm can find non-spherical clusters of any shape, but there are problems that the cluster centers of low-density regions are difficult to detect and the parameters are sensitive when the density of the data set is large, a new density extreme value algorithm is proposed. First, the concept of natural neighbors is introduced to find the natural neighbors of the data object, and the ellipse model is defined to calculate the local density of the data in the natural stable state. Second, it calculates the cosine similarity value of the data object, uses the cosine similarity value to update the connected value of the data object, and uses the connected value to divide the high and low density regions and outliers. Then, it uses the construction density extreme value function to find the different density regions. Finally, it merges the non-cluster center points of different regions into the cluster where the nearest cluster center is located. Through experimental analysis on the synthetic data set and UCI public data set, this algorithm has achieved better results than other comparison algorithms in processing data sets with large differences in density distribution.

Key words: clustering, natural neighbor, density adaptive distance, anchor points, connected value, density extreme

摘要:

针对密度峰值聚类算法存在数据集密度差异较大时,低密度区域聚类中心难以检测和参数敏感的问题,提出了一种新型密度极值算法。引入自然邻居概念寻找数据对象自然近邻,定义椭圆模型计算自然稳定状态下数据局部密度;计算数据对象余弦相似性值,用余弦相似性值来更新数据对象连通值,采用连通值划分高低密度区域和离群点;构造密度极值函数找到高低密度不同区域聚类中心点;将不同区域非聚类中心点归并到离其最近的聚类中心所在簇中。通过在合成数据集和UCI公共数据集实验分析:该算法比其他对比算法在处理密度分布差异较大数据集上取得了更好的结果。

关键词: 聚类, 自然邻居, 密度自适应距离, 锚点, 连通值, 密度极值