计算机工程与应用 ›› 2010, Vol. 46 ›› Issue (27): 131-134.DOI: 10.3778/j.issn.1002-8331.2010.27.036

• 数据库、信号与信息处理 • 上一篇    下一篇

GA在特征选择中的应用与设计研究

何绍荣1,朱颢东2,3   

  1. 1.四川理工学院 计算机科学系,四川 自贡 643000
    2.郑州轻工业学院 计算机与通信工程学院,郑州 450002
    3.中国科学院 成都计算机应用研究所,成都 610041
  • 收稿日期:2009-04-07 修回日期:2009-06-08 出版日期:2010-09-21 发布日期:2010-09-21
  • 通讯作者: 何绍荣

Research on application and design of GA in feature selection

HE Shao-rong1,ZHU Hao-dong2,3   

  1. 1.Department of Computer Science,Sichuan University of Science & Engineering,Zigong,Sichuan 643000,China
    2.College of Computer and Communication Engineering,Zhengzhou University of Light Industry,Zhengzhou 450002,China
    3.Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu 610041,China
  • Received:2009-04-07 Revised:2009-06-08 Online:2010-09-21 Published:2010-09-21
  • Contact: HE Shao-rong

摘要: 从海量文本集中选择较优秀的特征子集是文本分类中的一个NP-难问题。而对于NP-问题,遗传算法往往能够有效地加以解决。为了克服传统遗传算法的“漂移”和“早敛”问题,首先引入了粗糙集并在此基础上详细设计了适应度函数、自适应交叉算子、自适应变异算子以及合理的终止条件。以此遗传算法为基础设计了一个特征选择算法。在复旦大学提供的语料库上进行了试验验证。实验结果表明此特征选择算法性能良好。

Abstract: It is a NP-question to choose more representative feature subset from massive Chinese data set in text categorization.With regard to the NP-question,genetic algorithm is often able to solve it effectively.In order to overcome "Drift" problem and” Early converges” problem of traditional genetic algorithm,this article firstly introduces rough sets and designs the fitness function,adaptive crossover operator,adaptive mutation operator and reasonable termination conditions.And then a feature selection algorithm is presented based on the designed genetic algorithm.Finally,the feature selection algorithm is validated by means of the corpus which is provided by Fudan University.Experiment results show that the proposed feature selection algorithm has good performance.

中图分类号: