计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (34): 127-129.

• 数据库、信号与信息处理 • 上一篇    下一篇

考虑数据排序的改进CABOSFV聚类

武 森,王 静,谭一松   

  1. 北京科技大学 经济管理学院,北京 100083
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-12-01 发布日期:2011-12-01

Improved CABOSFV clustering considering data sort

WU Sen,WANG Jing,TAN Yisong   

  1. School of Economics and Management,University of Science and Technology Beijing,Beijing 100083,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-12-01 Published:2011-12-01

摘要: CABOSFV是基于稀疏特征进行高维数据聚类的高效算法,但算法的聚类质量受数据输入顺序的影响。针对此问题,提出考虑数据排序的改进CABOSFV聚类(CABOSFV_CS),通过定义稀疏性指数来描述数据的稀疏特征,并按照稀疏性指数升序对数据进行排序以改进CABOSFV算法的聚类质量。采用UCI基准数据集进行实验,结果表明与传统的CABOSFV算法相比,CABOSFV_CS有效地提高了聚类准确率。

关键词: CABOSFV算法, 高维数据, 稀疏特征, 聚类

Abstract: CABOSFV is an efficient algorithm based on sparse feature for high dimensional data clustering.However the clustering quality of the algorithm is sensitive to the order of input data.To this problem,improved CABOSFV clustering considering data sort(CABOSFV_CS) is proposed,which describes the sparse feature of data by defining a new concept sparseness index and improves the clustering quality of CABOSFV by sorting data according to the ascending sequence of sparseness index.UCI benchmark data sets are used to compare CABOSFV_CS with traditional CABOSFV algorithm.The empirical tests show that CABOSFV_CS increases the clustering accuracy effectively.

Key words: CABOSFV algorithm, high dimensional data, sparse feature, clustering