计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (12): 130-132.

• 数据库、信号与信息处理 • 上一篇    下一篇

一种基于信息增益的特征优化选择方法

刘庆和,梁正友   

  1. 广西大学 计算机与电子信息学院,南宁 530004
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-04-21 发布日期:2011-04-21

Optimized approach of feature selection based on information gain

LIU Qinghe,LIANG Zhengyou   

  1. School of Computer,Electronics and Information,Guangxi University,Nanning 530004,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-04-21 Published:2011-04-21

摘要: 特征选择是文本分类的一个重要环节,它可以有效提高分类精度和效率。在研究文本分类特征选择方法的基础上,分析了信息增益方法的不足,将频度、集中度、分散度应用到信息增益方法上,提出了一种基于信息增益的特征优化选择方法。实验表明,该方法在分类效果与性能上都优于传统方法。

关键词: 特征选择, 信息增益, 频度, 集中度, 分散度

Abstract: Feature selection is an essential part of text categorization,which can effectively improve classification precision and efficiency.With some drawbacks proposed from traditional IG approach,an optimized approach that takes frequency,concentration and distribution into account is proposed for improving IG approach.The experimental results show that the improved IG approach is superior to traditional IG approach in feature selection.

Key words: feature selection, information gain, frequency, concentration, distribution