计算机工程与应用 ›› 2009, Vol. 45 ›› Issue (11): 137-141.DOI: 10.3778/j.issn.1002-8331.2009.11.042

• 数据库、信号与信息处理 • 上一篇    下一篇

VDBSCAN:变密度聚类算法

周 董,刘 鹏   

  1. 上海财经大学 信息管理与工程学院,上海 200433
  • 收稿日期:2008-11-17 修回日期:2009-02-09 出版日期:2009-04-11 发布日期:2009-04-11
  • 通讯作者: 周 董

VDBSCAN:varied density based clustering algorithm

ZHOU Dong,LIU Peng   

  1. Department of Information Management and Engineering,Shanghai University of Finance and Economics,Shanghai 200433,China
  • Received:2008-11-17 Revised:2009-02-09 Online:2009-04-11 Published:2009-04-11
  • Contact: ZHOU Dong

摘要: 传统的密度聚类算法不能识别并聚类多个不同密度的簇。对此提出了变密度聚类算法VDBSCAN,针对密度不稳定的数据集,可有效识别并同时聚类不同密度的簇,避免合并和遗漏。VDBSCAN算法的基本思想是:根据k-dist图和DK分析,对数据集中的不同密度层次自动选择一组Eps值,分别调用DBSCAN算法。不同的Eps值,能够找到不同密度的簇。4个二维数据集实验验证了VDBSCAN算法的有效性,表明VDBSCAN算法可以有效地聚类密度不均匀的数据集,且参数Eps的自动选择方法也是有效的和健壮的。

关键词: 变密度聚类算法, 基于密度的聚类, DBSCAN, 数据挖掘

Abstract: Density clustering has been widely used with such advantages as:its clusters are easy to understand and it does not limit itself to shapes of clusters.But existing density-based algorithms have trouble in finding out all the meaningful clusters for datasets with varied densities.This paper introduces a new algorithm called VDBSCAN for the purpose of varied-density datasets analysis.The basic idea of VDBSCAN is that,before adopting traditional DBSCAN algorithm,k-dist plot and DK(Difference between k-dists of neighboring points) analysis are used to select several values of parameter Eps for different densities.With different values of Eps,it is possible to find out clusters with varied densities simultaneity.Finally,4 synthetic 2-dimension databases are used for demonstration,and experiments show that VDBSCAN is efficient in successfully clustering uneven datasets.

Key words: Varied Density Based Clustering Algorithm(VDBSCAN), density-based clustering, Density Based Spatial Clustering of Application with Nose(DBSCAN), data mining