Computer Engineering and Applications ›› 2009, Vol. 45 ›› Issue (14): 136-137.DOI: 10.3778/j.issn.1002-8331.2009.14.041

• 数据库、信号与信息处理 • Previous Articles     Next Articles

Improved χ2 statistics method for text feature selection

XIAO Ting,TANG Yan   

  1. School of Computer & Information Science,Southwest University,Chongqing 400715,China
  • Received:2008-11-20 Revised:2009-02-13 Online:2009-05-11 Published:2009-05-11
  • Contact: XIAO Ting

改进的χ2统计文本特征选择方法

肖 婷,唐 雁   

  1. 西南大学 计算机与信息科学学院,重庆 400715
  • 通讯作者: 肖 婷

Abstract: Feature selection is a hot topic in current search field,especially in the field of text categorization.In this paper,χ2 statistical method has two defects.One is reducing the weight of the low-frequency words.The other is increasing the weight of the characteristics in the designated class.The characteristics little appear in designated class but other classes.Through simulation and comparison experiment,the result is better than before.

摘要: 特征选择是当今研究领域的一个热点,尤其是文本分类领域中的热点。针对χ2统计方法的两个缺陷:降低了低频词的权重和提高了很少在指定类中出现但普遍存在于其他类的特征在该类中的权重,对χ2统计方法进行改进,并通过做模拟和对比实验,对比改进前后的方法对文本分类的影响。在模拟和对比实验中,改进后方法的分类效果要好于传统的方法。