Computer Engineering and Applications ›› 2011, Vol. 47 ›› Issue (12): 130-132.

• 数据库、信号与信息处理 • Previous Articles     Next Articles

Optimized approach of feature selection based on information gain

LIU Qinghe,LIANG Zhengyou   

  1. School of Computer,Electronics and Information,Guangxi University,Nanning 530004,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-04-21 Published:2011-04-21

一种基于信息增益的特征优化选择方法

刘庆和,梁正友   

  1. 广西大学 计算机与电子信息学院,南宁 530004

Abstract: Feature selection is an essential part of text categorization,which can effectively improve classification precision and efficiency.With some drawbacks proposed from traditional IG approach,an optimized approach that takes frequency,concentration and distribution into account is proposed for improving IG approach.The experimental results show that the improved IG approach is superior to traditional IG approach in feature selection.

Key words: feature selection, information gain, frequency, concentration, distribution

摘要: 特征选择是文本分类的一个重要环节,它可以有效提高分类精度和效率。在研究文本分类特征选择方法的基础上,分析了信息增益方法的不足,将频度、集中度、分散度应用到信息增益方法上,提出了一种基于信息增益的特征优化选择方法。实验表明,该方法在分类效果与性能上都优于传统方法。

关键词: 特征选择, 信息增益, 频度, 集中度, 分散度