计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (23): 38-44.DOI: 10.3778/j.issn.1002-8331.2001-0309

• 理论与研发 • 上一篇    下一篇

拓展差异度的高维数据聚类算法

武森,何慧霞,范岩岩   

  1. 北京科技大学 经济管理学院,北京 100083
  • 出版日期:2020-12-01 发布日期:2020-11-30

High Dimensional Data Clustering Algorithm Based on Extended Dissimilarity

WU Sen, HE Huixia, FAN Yanyan   

  1. School of Economics and Management, University of Science and Technology Beijing, Beijing 100083, China
  • Online:2020-12-01 Published:2020-11-30

摘要:

CABOSFV是一种有效的高维数据聚类算法。针对CABOSFV算法倾向于将数据对象分配到更大的类中这一问题,提出一种拓展差异度的高维数据聚类算法(CABOSFV_D)。该算法引入了调整指数[p],对原始稀疏差异度进行拓展,降低类大小对对象分配的影响;同时用位集的方式实现CABOSFV_D算法,使算法的运算效率明显提升。基于多个UCI标准数据集进行聚类实验,结果表明CABOSFV_D在聚类效果和时间效率上均优于原始CABOSFV算法。

关键词: 拓展差异度, CABOSFV, 高维聚类, CABOSFV_D, 位集

Abstract:

CABOSFV is an effective high dimensional data clustering algorithm, while it tends to allocate data objects to larger clusters. To solve this problem, CABOSFV_D, a high dimensional data clustering algorithm based on extended dissimilarity, is proposed. An adjustment index [p] is introduced to expand the original sparse feature dissimilarity and reduce the impact of cluster size on object allocation. At the same time, the method of bit set is used to realize the CABOSFV_D, which significantly improves the efficiency of the algorithm. Finally, experiments are performed based on multiple UCI standard datasets. The results show that CABOSFV_D is superior to traditional algorithm in clustering effect and time efficiency.

Key words: extended dissimilarity, CABOSFV, high-dimensional clustering, CABOSFV_D, bit set