Computer Engineering and Applications ›› 2020, Vol. 56 ›› Issue (23): 38-44.DOI: 10.3778/j.issn.1002-8331.2001-0309
Previous Articles Next Articles
WU Sen, HE Huixia, FAN Yanyan
Online:
Published:
武森,何慧霞,范岩岩
Abstract:
CABOSFV is an effective high dimensional data clustering algorithm, while it tends to allocate data objects to larger clusters. To solve this problem, CABOSFV_D, a high dimensional data clustering algorithm based on extended dissimilarity, is proposed. An adjustment index [p] is introduced to expand the original sparse feature dissimilarity and reduce the impact of cluster size on object allocation. At the same time, the method of bit set is used to realize the CABOSFV_D, which significantly improves the efficiency of the algorithm. Finally, experiments are performed based on multiple UCI standard datasets. The results show that CABOSFV_D is superior to traditional algorithm in clustering effect and time efficiency.
Key words: extended dissimilarity, CABOSFV, high-dimensional clustering, CABOSFV_D, bit set
摘要:
CABOSFV是一种有效的高维数据聚类算法。针对CABOSFV算法倾向于将数据对象分配到更大的类中这一问题,提出一种拓展差异度的高维数据聚类算法(CABOSFV_D)。该算法引入了调整指数[p],对原始稀疏差异度进行拓展,降低类大小对对象分配的影响;同时用位集的方式实现CABOSFV_D算法,使算法的运算效率明显提升。基于多个UCI标准数据集进行聚类实验,结果表明CABOSFV_D在聚类效果和时间效率上均优于原始CABOSFV算法。
关键词: 拓展差异度, CABOSFV, 高维聚类, CABOSFV_D, 位集
WU Sen, HE Huixia, FAN Yanyan. High Dimensional Data Clustering Algorithm Based on Extended Dissimilarity[J]. Computer Engineering and Applications, 2020, 56(23): 38-44.
武森,何慧霞,范岩岩. 拓展差异度的高维数据聚类算法[J]. 计算机工程与应用, 2020, 56(23): 38-44.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2001-0309
http://cea.ceaj.org/EN/Y2020/V56/I23/38