计算机工程与应用 ›› 2017, Vol. 53 ›› Issue (3): 80-86.DOI: 10.3778/j.issn.1002-8331.1506-0204

• 大数据与云计算 • 上一篇    下一篇

一种基于k-均值的DBSCAN算法参数动态选择方法

王兆丰,单甘霖   

  1. 军械工程学院 电子与光学工程系,石家庄 050003
  • 出版日期:2017-02-01 发布日期:2017-05-11

k-means based method for dynamically selecting DBSCAN algorithm parameters

WANG Zhaofeng, SHAN Ganlin   

  1. Electronics and Optics Engineering Department, Ordnance Engineering College, Shijiazhuang 050003, China
  • Online:2017-02-01 Published:2017-05-11

摘要: 为解决DBSCAN聚类算法的Eps及MinPts参数选择问题,提出一种领域无关的参数动态选择方法。首先,基于k-均值算法对数据集进行初步聚类,聚类中采用最大最小距离方法确定初始聚类中心。其次,针对k-均值聚类结果,计算统计各聚类中样本间距离的分布情况,选择使得具有最大样本对数的距离值作为对应类的Eps值,并通过Eps获得MinPts值。最后,对DBSCAN算法进行改进,使其可根据当前核心点所属k-均值聚类对应的Eps对其运行值进行自适应调整。将上述思想运用于未知协议条件下的比特流聚类分析,结果表明,在无需用户指定Eps及MinPts的条件下,即可获得满意的聚类结果,提高了算法的适用性和准确率。

关键词: 聚类, 一种经典的基于密度的聚类算法(DBSCAN), 参数选择, k-均值算法, 未知协议

Abstract: This paper puts forward a field-irrelative method for dynamically selecting the Eps and MinPts parameters for DBSCAN algorithm. The dataset is first crudely clustered with k-means algorithm using maximum and minimum distance initial-centers choosing method. The distance distribution of samples within each k-means cluster is then calculated and analyzed, choosing the distance which allows maximum point-pair numbers as Eps. The MinPts parameter is also calculated according to the confirmed Eps. Also it improves DBSCAN algorithm to dynamically adjust Eps according to the k-means cluster to which the current key point belongs. Appling the above ideas to unknown protocol bitstreams clustering, the experiment results demonstrate that the improved DBSCAN can yield satisfied clustering results without manually specifying the Eps and MinPts parameters. The applicability and accuracy of DBSCAN algorithm are improved.

Key words: clustering, Density-Based Spatial Clustering of Applications with Noise(DBSCAN) algorithm, parameter selection, k-means algorithm, unknown protocol