Computer Engineering and Applications ›› 2008, Vol. 44 ›› Issue (10): 23-27.

• 博士论坛 • Previous Articles     Next Articles

Unsupervised dimensionality reduction based on fractal dimension and genetic algorithm

YAN Guang-hui,LI Zhan-huai   

  1. School of Computer Science,Northwestern Polytechnical University,Xi’an 710072,China
  • Received:2007-11-19 Revised:2007-12-24 Online:2008-04-01 Published:2008-04-01
  • Contact: YAN Guang-hui

基于遗传算法的无监督分形属性规约技术

闫光辉,李战怀   

  1. 西北工业大学 计算机学院,西安 710072
  • 通讯作者: 闫光辉

Abstract: Dimensionality reduction is the powerful method to tackle the“Curse of Dimensionality”.Genetic algorithms based feature subset selection technique is superior to traditional feature selection method in the dimensionality reduction of the high dimensional data set.However,it can not be used in the field of unsupervised learning such as clustering which has no class label to use.FDR(Fractal Dimensionality Reduction) is the new unsupervised feature selection method.But,it is infeasible in practice in the high dimensional data set for its multiple scanning of the data set and high time consume.Accordingly,the authors propose the GABUFSS(Genetic Algorithm Based Unsupervised Feature Subset Selection) algorithm which combines the genetic algorithm and the fractal dimensionality reduction technique to tackle the unsupervised feature subset selection problem in the high dimensional data set.The experimental results using synthetic and real life data set show that GABUFSS algorithm achieves better performance than FDR algorithm in the high dimensional data set and can find identical subsets additionally.

Key words: data mining, curse of dimensionality, genetic algorithm, dimensionality reduction, fractal

摘要: 属性规约是应对“维数灾难”的有效技术,分形属性规约FDR(Fractal Dimensionality Reduction)是近年来出现的一种无监督属性选择技术,令人遗憾的是其需要多遍扫描数据集,因而难于应对高维数据集情况;基于遗传算法的属性规约技术对于高维数据而言优越于传统属性选择技术,但其无法应用于无监督学习领域。为此,结合遗传算法内在随机并行寻优机制及分形属性选择的无监督特点,设计并实现了基于遗传算法的无监督分形属性子集选择算法GABUFSS(Genetic Algorithm Based Unsupervised Feature Subset Selection)。基于合成与实际数据集的实验对比分析了GABUFSS算法与FDR算法的性能,结果表明GABUFSS相对优于FDR算法,并具有发现等价结果属性子集的特点。

关键词: 数据挖掘, 维数灾难, 遗传算法, 属性规约, 分形