Computer Engineering and Applications ›› 2015, Vol. 51 ›› Issue (13): 145-148.

Previous Articles     Next Articles

Text feature weighting method based on mutual information

FAN Xiaochao1,2, ZHANG Chongyang1, DENG Xiongwei1   

  1. 1.College of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210018, China
    2.College of Computer Science and Technology, Xinjiang Normal University, Urumqi 830054, China
  • Online:2015-07-01 Published:2015-06-30

基于互信息的文本特征加权方法

樊小超1,2,张重阳1,邓雄伟1   

  1. 1.南京理工大学 计算机科学与工程学院,南京 210018
    2.新疆师范大学 计算机科学技术学院,乌鲁木齐 830054

Abstract: Feature weighting is an important part of the procedure of text categorization, by examining the traditional feature selection function, it finds that the method of mutual information in feature weighting process performs particularly prominent. In order to improve the performance of the method of mutual information in feature weighting, the paper adds the term frequency information, document frequency information and categories correlation factor, and proposes a feature weighted based on mutual information method. The experiments show that this method has better classification performance than the traditional feature weighting method.

Key words: text categorization, feature selection, feature weighting, mutual information

摘要: 特征加权是文本分类中的重要环节,通过考察传统的特征选择函数,发现互信息方法在特征加权过程中表现尤为突出。为了提高互信息方法在特征加权时的性能,加入了词频信息、文档频率信息以及类别相关度因子,提出了一种基于改进的互信息特征加权方法。实验结果表明,该方法比传统的特征加权方法具有更好的分类性能。

关键词: 文本分类, 特征选择, 特征加权, 互信息