Computer Engineering and Applications ›› 2013, Vol. 49 ›› Issue (5): 132-135.

Previous Articles     Next Articles

New feature selection approach for text categorization

ZHANG Yufang, WANG Yong, LIU Ming, XIONG Zhongyang   

  1. College of Computer, Chongqing University, Chongqing 400044, China
  • Online:2013-03-01 Published:2013-03-14

新的文本分类特征选择方法研究

张玉芳,王  勇,刘  明,熊忠阳   

  1. 重庆大学 计算机学院,重庆 400044

Abstract: Feature reduction is an important part in text categorization. On the basis of existing approaches of feature selection, considering the distribution property of feature between the positive class and negative class, combining four measure indicators for feature with categories distinguishing ability, a new approach named Composite Ratio(CR) for feature selection is proposed. Experiment using K-Nearest Neighbor(KNN) algorithm to examine the effectiveness of CR, the result shows that approach has better performance in dimension reduction.

Key words: feature reduction, text categorization, feature selection, Composite Ratio(CR), K-Nearest Neighbor(KNN) algorithm

摘要: 特征降维是文本分类过程中的一个重要环节。在现有特征选择方法的基础上,综合考虑特征词在正类和负类中的分布性质,综合四种衡量特征类别区分能力的指标,提出了一个新的特征选择方法,即综合比率(CR)方法。实验采用K-最近邻分类算法(KNN)来考查CR方法的有效性,实验结果表明该方法能够取得比现有特征选择方法更优的降维效果。

关键词: 特征降维, 文本分类, 特征选择, 综合比率, K-最近邻分类算法