分布的自动阈值密度峰值聚类算法

doi:10.3778/j.issn.1002-8331.1910-0262

计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (5): 71-78.DOI: 10.3778/j.issn.1002-8331.1910-0262

分布的自动阈值密度峰值聚类算法

彭启慧，宣士斌，高卿

广西民族大学信息科学与工程学院，南宁 530006

出版日期:2021-03-01 发布日期:2021-03-02

Distribution Automatic Threshold Density Peak Clustering Algorithm

PENG Qihui, XUAN Shibin, GAO Qing

College of Information Science and Engineering, Guangxi University for Nationalities, Nanning 530006, China

Online:2021-03-01 Published:2021-03-02

摘要/Abstract

摘要：

密度峰值聚类（DPC）是一种基于局部密度的聚类方法，在DPC中影响算法的效果的两个基本因素是局部密度定义和类中心选择。针对经典DPC在定义局部密度时没有考虑到邻域内样本点的分布情况，以及无法自动选择类中心等问题，提出一种基于分布的局部密度定义和基于最大类间差法的自动类中心选择策略。计算每个样本点截断距离圆圈内的数据点个数，同时考虑数据点的分布情况。当圈内具有相同的点个数时，如果圆圈内的数据点分布越均匀，该点的局部密度就越大，密度峰值的可能性越高。通过最大类间差法（Otsu）自动选择阈值找出类中心。实验结果表明，新算法不仅能够自动选择聚类中心，而且相比已有原算法能获得更高分类准确度。

关键词: 聚类, 密度峰值, 自动选择, 类中心点

Abstract:

Density Peak Clustering（DPC） is a clustering method based on local density. There are two basic factors in the DPC that affect the effect of the algorithm：local density definition and class center selection. For the classical DPC, the specific distribution of sample points in the neighborhood is not taken into consideration when defining the local density, and the cluster center cannot be automatically selected in the cluster. A local density definition based on distribution and an automatic class center selection strategy based on the maximum classes’ square error method are proposed. Firstly, the number of data points within the circle of each sample point is calculated, and the distribution of data points is considered. When there are the same number of points in the circle, if the distribution of data points in the circle is more uniform, the local density of the point is larger, and the probability of peak density is higher. The class center is then found by automatically selecting the threshold by the maximum classes’ square error method（Otsu）. Experimental results show that the new algorithm can not only automatically select the clustering center, but also obtain higher classification accuracy than the existing original algorithm.

Key words: clustering, density peak, automatic selection, class midpoint

彭启慧，宣士斌，高卿. 分布的自动阈值密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(5): 71-78.

PENG Qihui, XUAN Shibin, GAO Qing. Distribution Automatic Threshold Density Peak Clustering Algorithm[J]. Computer Engineering and Applications, 2021, 57(5): 71-78.

[1]	兰红，黄敏. 融合KNN优化的密度峰值和FCM聚类算法[J]. 计算机工程与应用, 2021, 57(9): 81-88.
[2]	郭晓静，隋昊达. 改进YOLOv3在机场跑道异物目标检测中的应用[J]. 计算机工程与应用, 2021, 57(8): 249-255.
[3]	李莉，纪欣沅，宋嵩. 回环软件缺陷数量预测模型[J]. 计算机工程与应用, 2021, 57(7): 158-163.
[4]	霍光煜，张勇，孙艳丰，尹宝才. 基于语义的档案数据智能分类方法研究[J]. 计算机工程与应用, 2021, 57(6): 247-253.
[5]	杨芳，尹曦，司建辉，刘宏媛，汪雪. 基于侧重点聚类的数学表达式相似度计算方法[J]. 计算机工程与应用, 2021, 57(6): 88-93.
[6]	赵凡，张琳，闻治泉，杨林林，蔺广逢. 一种直接高效的自然场景汉字逼近定位方法[J]. 计算机工程与应用, 2021, 57(6): 159-167.
[7]	李勇振，廖湖声. 基于图卷积神经网络的多视角聚类[J]. 计算机工程与应用, 2021, 57(5): 115-122.
[8]	王昌龙，张远东，缪宏，杨煜恒. 双通道卷积神经网络在南瓜病害识别上的应用[J]. 计算机工程与应用, 2021, 57(5): 183-189.
[9]	胡晓敏，王明丰，张首荣，李敏. 用于文本聚类的新型差分进化粒子群算法[J]. 计算机工程与应用, 2021, 57(4): 61-67.
[10]	王俊玲，卢新明. 基于语义相关的视频关键帧提取算法[J]. 计算机工程与应用, 2021, 57(4): 192-198.
[11]	王芙银，张德生，张晓. 结合鲸鱼优化算法的自适应密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(3): 94-102.
[12]	陈俊丰，郑中团. WKMeans与SMOTE结合的不平衡数据过采样方法[J]. 计算机工程与应用, 2021, 57(23): 106-112.
[13]	张忠林，赵昱，闫光辉. 自然邻居密度极值聚类算法[J]. 计算机工程与应用, 2021, 57(23): 200-210.
[14]	梅婕，魏圆圆，许桃胜. 基于密度峰值多起始中心的融合聚类算法[J]. 计算机工程与应用, 2021, 57(22): 78-85.
[15]	张子然，黄卫华，陈阳，章政，李梓远. 基于双向搜索的改进蚁群路径规划算法[J]. 计算机工程与应用, 2021, 57(21): 270-277.

分布的自动阈值密度峰值聚类算法

Distribution Automatic Threshold Density Peak Clustering Algorithm

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics