Computer Engineering and Applications ›› 2008, Vol. 44 ›› Issue (13): 130-133.

• 数据库、信号与信息处理 • Previous Articles     Next Articles

Mutual information maximization based feature selection algorithm in text classification

TANG Liang1,DUAN Jian-guo2,XU Hong-bo2,LIANG Ling1   

  1. 1.Information Engineering College,PLA Information Engineering University,Zhengzhou 450002,China
    2.Ministry of Science and Technology Network,Institute of Computing Technology,CAS,Beijing 100080,China
  • Received:2007-10-25 Revised:2008-01-29 Online:2008-05-01 Published:2008-05-01
  • Contact: TANG Liang

基于互信息最大化的特征选择算法及应用

唐 亮1,段建国2,许洪波2,梁 玲1   

  1. 1.解放军信息工程大学 信息工程学院,郑州 450002
    2.中国科学院 计算技术研究所 网络科学技术部,北京 100080
  • 通讯作者: 唐 亮

Abstract: With the guide of mutual information maximization principles,this paper proposes a new feature selection algorithm which is based on the information theory models,namely,mutual information maximization based feature selection algorithm(MaxMI).The basic idea is after the feature selection,the information of the categories should be retained as much as possible.This algorithm is a little similar with the traditional information gain,mutual information and cross entropy in the expression format,but is not the same essentially.From the experiments,We can see that our algorithm is superior to the other three algorithms.

Key words: text classification, feature selection, cross-entropy, information gain, mutual information maximization

摘要: 该文以互信息最大化原则为指导,经过推导和分析后提出了一种基于信息论模型的新的特征选择算法,称之为基于互信息最大化的特征选择算法(MaxMI)。基本思想就是特征选择后,应当尽可能多地保留关于类别的信息。该算法与传统的信息增益、互信息和交叉熵在表达形式上具有一定的相似性,但是并不完全相同。从实验上验证了基于互信息最大化的特征选择算法优于其它三种算法。

关键词: 文本分类, 特征选择, 交叉熵, 信息增益, 互信息最大化