Computer Engineering and Applications ›› 2020, Vol. 56 ›› Issue (23): 38-44.DOI: 10.3778/j.issn.1002-8331.2001-0309

Previous Articles     Next Articles

High Dimensional Data Clustering Algorithm Based on Extended Dissimilarity

WU Sen, HE Huixia, FAN Yanyan   

  1. School of Economics and Management, University of Science and Technology Beijing, Beijing 100083, China
  • Online:2020-12-01 Published:2020-11-30



  1. 北京科技大学 经济管理学院,北京 100083


CABOSFV is an effective high dimensional data clustering algorithm, while it tends to allocate data objects to larger clusters. To solve this problem, CABOSFV_D, a high dimensional data clustering algorithm based on extended dissimilarity, is proposed. An adjustment index [p] is introduced to expand the original sparse feature dissimilarity and reduce the impact of cluster size on object allocation. At the same time, the method of bit set is used to realize the CABOSFV_D, which significantly improves the efficiency of the algorithm. Finally, experiments are performed based on multiple UCI standard datasets. The results show that CABOSFV_D is superior to traditional algorithm in clustering effect and time efficiency.

Key words: extended dissimilarity, CABOSFV, high-dimensional clustering, CABOSFV_D, bit set



关键词: 拓展差异度, CABOSFV, 高维聚类, CABOSFV_D, 位集