计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (14): 45-51.DOI: 10.3778/j.issn.1002-8331.1908-0501

• 理论与研发 • 上一篇    下一篇

改进的自适应参数DBSCAN聚类算法

王光,林国宇   

  1. 辽宁工程技术大学 软件学院,辽宁 葫芦岛 125105
  • 出版日期:2020-07-15 发布日期:2020-07-14

Improved Adaptive Parameter DBSCAN Clustering Algorithm

WANG Guang, LIN Guoyu   

  1. College of Software, Liaoning Technical University, Huludao, Liaoning 125105, China
  • Online:2020-07-15 Published:2020-07-14

摘要:

针对传统DBSCAN算法需要人工输入[Eps]和[MinPts]参数,且参数选择不合理导致聚类准确率低的问题,提出了一种改进的自适应参数密度聚类算法。采用核密度估计确定[Eps]和[MinPts]参数的合理区间,通过分析数据局部密度特点确定簇数,根据合理区间内的参数值进行聚类,计算满足簇数条件时的轮廓系数,最大轮廓系数对应的参数即为最优参数。在4种经典数据集上进行对比实验,结果表明,该算法能够自动选择最优的[Eps]和[MinPts]参数,准确率平均提高6.1%。

关键词: 密度聚类, DBSCAN算法, 自适应, 核密度估计, 参数寻优

Abstract:

Aiming at the problem that traditional DBSCAN algorithm needs to input [Eps] and [MinPts] parameters manually, and improper parameter selection leads to low clustering accuracy, an improved adaptive parameter density clustering algorithm is proposed. Firstly, the kernel density estimation is used to determine the reasonable interval of [Eps] and [MinPts] parameters, and the cluster number is determined by analyzing the local density characteristics of the data. Then, the clustering is performed according to the parameter values within the reasonable interval. Finally, the contour coefficients satisfying the cluster number condition are calculated, and the parameter corresponding to the maximum contour coefficient is the optimal parameter. The comparison experiments on four classical datasets show that the algorithm can automatically select the optimal [Eps] and [MinPts] parameters, and the accuracy is improved by 6.1% on average.

Key words: density clustering, DBSCAN algorithm, self-adaptive, kernel density estimation, parameter optimization