计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (10): 135-137.

• 数据库、信号与信息处理 • 上一篇    下一篇

基于互信息的文本特征选择方法研究与改进

刘 健,张维明   

  1. 国防科技大学 信息系统与管理学院,长沙 410073
  • 收稿日期:2007-09-20 修回日期:2007-11-29 出版日期:2008-04-01 发布日期:2008-04-01
  • 通讯作者: 刘 健

Study and improvement of mutual information based text feature selection method

LIU Jian,ZHANG Wei-ming   

  1. School of Information System and Management,National University of Defense Technology,Changsha 410073,China
  • Received:2007-09-20 Revised:2007-11-29 Online:2008-04-01 Published:2008-04-01
  • Contact: LIU Jian

摘要: 通过对互信息(MI)文本特征选择方法与信息增益、卡方统计方法的实验研究比较,发现了影响MI方法性能的主要因素是特征选择过程中的随机性,通过加入扰动因子的方法对MI方法进行了改进,消除了随机性的影响,实验表明,改进后的MI方法与信息增益、卡方统计方法比较,具有较明显的优势。

关键词: 互信息, 信息增益, CHI, 文本分类, 特征选择

Abstract: A study on the contrast of text selection methods based on mutual information,information gain and CHI has been taken in this paper,find the main factor restrains the performance of MI method is randomicity,an improvement is proposed.The experiment shows that the improved method has good performance and is better than IG and CHI method.

Key words: mutual information, information gain, CHI, text classification, feature selection