计算机工程与应用 ›› 2009, Vol. 45 ›› Issue (22): 107-110.DOI: 10.3778/j.issn.1002-8331.2009.22.035

• 数据库、信息处理 • 上一篇    下一篇

使用PGA的特征选择方法

马春华1,朱颢东2   

  1. 1.绥化学院 计算机科学与技术系,黑龙江 绥化 152061
    2.中国科学院 成都计算机应用研究所,成都 610041
  • 收稿日期:2009-04-08 修回日期:2009-06-05 出版日期:2009-08-01 发布日期:2009-08-01
  • 通讯作者: 马春华

Feature selection method applicated PGA

MA Chun-hua1,ZHU Hao-dong2   

  1. 1.Department of Computer Science and Technology,Suihua College,Suihua,Heilongjiang 152061,China
    2.Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu 610041,China
  • Received:2009-04-08 Revised:2009-06-05 Online:2009-08-01 Published:2009-08-01
  • Contact: MA Chun-hua1

摘要: 特征选择是文本分类系统的核心步骤之一。然而现有的特征选择方法都是串行化的,应用于中文海量文本数据时时间效率较低,因此利用并行策略来提高特征选择的效率,已经成为研究的热点。详细设计了一个用于特征选择的并行遗传算法,该算法采用遗传算法搜索特征,利用并行策略评价特征子集,即将种群中个体的适应度计算并行在多个计算节点上同时进行,从而较快地获得较具代表性的特征子集。实验结果表明该方法是有效的。

关键词: 文本分类, 特征选择, 遗传算法, 并行策略

Abstract: Feature selection is one of the key steps in text classification system.However,most of existing feature selection methods are serial and are inefficient timely to be applied to Chinese massive text data sets,so it is a hotspot how to improve efficiency of feature selection by means of parallel strategy.It detailedly designs a Parallel Genetic Algorithm(PGA) which is used to select features.The algorithm uses genetic algorithm to search features and calculates fitness of feature subsets in multiple computing nodes at the same time,so can acquire quickly feature subsets which are more representative.Experimental results show that the method is effective.

Key words: text categorization, feature selection, Genetic Algorithm(GA), parallel strategy