计算机工程与应用 ›› 2010, Vol. 46 ›› Issue (34): 123-125.DOI: 10.3778/j.issn.1002-8331.2010.34.037

• 数据库、信号与信息处理 • 上一篇    下一篇

文本分类中互信息特征选择方法的研究

范小丽,刘晓霞   

  1. 西北大学 信息科学与技术学院,西安 710127
  • 收稿日期:2010-05-18 修回日期:2010-07-06 出版日期:2010-12-01 发布日期:2010-12-01
  • 通讯作者: 范小丽

Study on mutual information-based feature selection in text categorization

FAN Xiao-li,LIU Xiao-xia   

  1. College of Information Science & Technology,Northwest University,Xi’an 710127,China
  • Received:2010-05-18 Revised:2010-07-06 Online:2010-12-01 Published:2010-12-01
  • Contact: FAN Xiao-li

摘要: 针对互信息特征选择方法由于没有很好结合正相关特征和负相关特征,影响在不平衡语料集上分类效果的问题,用平衡因子调整正相关和负相关特征比例,加强特征选择时负相关特征的作用。同时引入特征分布差异因子,区分类强相关特征,提高分类效果。最后通过实验证明,改进的互信息特征选择方法具有可行性和有效性。

Abstract: To solve the problem of the poor effect of mutual information-based feature selection on the unbalanced corpus which arise from not well combining positive feature and negative feature.The ratio of positive feature and negative feature is adjusted with balance factor to strengthen the effect of negative feature.And category strong related feature is distincted with feature distributed factor.The experimental results verify the efficiency and probability of the improved mutual information-based feature selection.

中图分类号: