Computer Engineering and Applications ›› 2012, Vol. 48 ›› Issue (7): 128-131.

• 数据库、信号与信息处理 • Previous Articles     Next Articles

Feature selection based on feature distinguish ability and meta-information

WANG Xing, ZHANG Wenpeng   

  1. School of Software, Nanyang Normal University, Nanyang, Henan 473061, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2012-03-01 Published:2012-03-01

基于特征辨别能力和元信息的特征选择

王 兴,张文鹏   

  1. 南阳师范学院 软件学院,河南 南阳 473061

Abstract: Feature selection is one of the key steps in text categorization, the selected feature subset directly influences results of text categorization. The feature distinguish ability based on word frequency and document frequency is presented. Meta-information is introduced into rough sets and an attribute reduction algorithm based on meta-information is provided. A comprehensive feature selection method is proposed. The comprehensive method firstly uses the feature distinguish ability to select feature and filter out some terms to reduce the sparsity of feature spaces, and then employs the provided attribute reduction algorithm to eliminate redundancy, so that the more representative feature subset is acquired. The experimental results show that the comprehensive method in a certain extent has advantages.

摘要: 特征选择是文本分类的关键步骤之一,所选特征子集的优劣直接影响文本分类的结果。在分析词频方法和文档频方法不足的基础上提出了特征辨别能力,把元信息引入粗糙集并提出了一个基于元信息的属性约简算法,给出了一个综合性特征选择方法。该方法利用特征辨别能力进行特征初选以过滤掉一些词条来降低特征空间的稀疏性,使用所提属性约简算法消除冗余,从而获得较具代表性的特征子集。实验结果表明:所提特征选择方法在一定程度上具有一定的优势。