Computer Engineering and Applications ›› 2017, Vol. 53 ›› Issue (14): 130-137.DOI: 10.3778/j.issn.1002-8331.1609-0111

Previous Articles     Next Articles

Improved SUBCLU subspace clustering algorithm for high dimensional data

LUO Jing, QIAN Xuezhong, HAN Lizhao, SONG Wei   

  1. Engineering Research Center of Internet of Things Technology Applications Ministry of Education, School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2017-07-15 Published:2017-08-01


罗  靖,钱雪忠,韩利钊,宋  威   

  1. 江南大学 物联网工程学院 物联网技术应用教育部工程研究中心,江苏 无锡 214122

Abstract: SUBCLU algorithm is a subspace clustering algorithm for high dimensional data. However, it produces a lot of intermediate clusters during the iteration of finding maximum interesting subspace clusters by using bottom-up search strategy. A large amount of time is consumed in the process of generating these intermediate clusters. Focus on this problem, algorithm BDFS-SUBCLU (the deep-first search with back-trace-based SUBCLU) is proposed. To avoid producing the intermediate clusters and reduce the time complexity, this algorithm uses deep-first search with back-trace to find maximum interesting subspace clusters. To avoid that the adjacent clusters affected by those special data points merge to one, BDFS-SUBCLU constraints the key point in every subspace. The experiments conducted on synthetic datasets and real datasets show that BDFS-SUBCLU improves efficiency and accuracy compared to SUBCLU.

Key words: SUBCLU, subspace clustering, high dimensional data, interesting subspace

摘要: SUBCLU高维子空间聚类算法在自底向上搜索最大兴趣子空间类的过程中不断迭代产生中间类,这些中间类的产生消耗了大量时间,针对这一问题,提出改进算法BDFS-SUBCLU,采用一种带回溯的深度优先搜索策略来挖掘最大兴趣子空间中的类,通过这种策略避免了中间类的产生,降低了算法的时间复杂度。同时BDFS-SUBCLU算法在子空间中对核心点增加一种约束,通过这个约束条件在一定程度上避免了聚类过程中相邻的类由于特殊的数据点合为一类的情况。在仿真数据集和真实数据集上的实验结果表明BDFS-SUBCLU算法与SUBCLU算法相比,效率和准确性均有所提高。

关键词: SUBCLU, 子空间聚类, 高维数据, 兴趣子空间