计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (1): 146-152.DOI: 10.3778/j.issn.1002-8331.1704-0008

• 模式识别与人工智能 • 上一篇    下一篇

基于改进量子进化算法的特征选择

周  丹1,2,吴春明1   

  1. 1.浙江大学 计算机系统结构研究所,杭州 310027
    2.台州职业技术学院 电信学院,浙江 台州 318000
  • 出版日期:2018-01-01 发布日期:2018-01-15

Feature selection based on improved quantum evolutionary algorithm

ZHOU Dan1,2, WU Chunming1   

  1. 1.Institute of Computer System Architecture, Zhejiang University, Hangzhou 310027, China
    2.Institute of Electrical Information, Taizhou Vocational and Technical College, Taizhou, Zhejiang 318000, China
  • Online:2018-01-01 Published:2018-01-15

摘要: 特征选择作为一种数据预处理技术被广泛研究,由于其具有NP难度而一直无法找到有效的求解方法。鉴于目前在特征选择中应用较多的遗传算法存在进化机制上的局限,将量子进化算法应用于特征选择,提出了一种基于改进量子进化算法的特征选择算法。以增加种群多样性和提高寻优性能为目标改进了量子进化算法,以Fisher比和特征维度为特征子集的评价准则构造了适应度函数,按照量子进化算法求解优化问题的步骤设计了特征选择算法。使用UCI数据库中的数据集对三种算法作对比验证,通过识别重要特征、提高学习算法性能、特征选择效率三组实验,结果表明,该算法能够识别出重要特征,并随着数据集特征维度升高,特征选择的性能逐渐优于对比算法,到了高维数据集,特征选择效率明显优于对比算法。

关键词: 特征选择, 量子进化算法, 遗传算法, 特征子集, 特征维度

Abstract: Feature selection is widely studied as a data preprocessing technology because it has the NP difficulty and an effective solution has not been found. Since the genetic algorithm which applies most in feature selection at present has the limit in evolutionism, Quantum Evolutionary Algorithm(QEA) is applied to feature selection, and a feature selection algorithm based on improved QEA is put forward. Firstly, QEA is improved in order to enhance its optimization performance and increase its population diversity. Then, the fitness function is constructed according to the evaluation criteria of feature subset based on the Fisher ratio and the feature dimensionality. Lastly, a feature selection algorithm is designed in the light of the steps to solve the optimization problem by QEA. Use data in UCI data base to make comparison validation experiments to 3 algorithms, such as identify important features, improve learning algorithm performance, and feature selection efficiency. The experimental results show that the algorithm can identify important features, and with the going up of the data set feature dimension, the performance of feature selection is gradually better than contrast algorithm; when it comes to high dimensional data set, feature selection efficiency is obviously better than the contrast algorithm.

Key words: feature selection, Quantum Evolutionary Algorithm(QEA), Genetic Algorithm, feature subset, feature dimension