计算机工程与应用 ›› 2012, Vol. 48 ›› Issue (7): 128-131.

• 数据库、信号与信息处理 • 上一篇    下一篇

基于特征辨别能力和元信息的特征选择

王 兴,张文鹏   

  1. 南阳师范学院 软件学院,河南 南阳 473061
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2012-03-01 发布日期:2012-03-01

Feature selection based on feature distinguish ability and meta-information

WANG Xing, ZHANG Wenpeng   

  1. School of Software, Nanyang Normal University, Nanyang, Henan 473061, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2012-03-01 Published:2012-03-01

摘要: 特征选择是文本分类的关键步骤之一,所选特征子集的优劣直接影响文本分类的结果。在分析词频方法和文档频方法不足的基础上提出了特征辨别能力,把元信息引入粗糙集并提出了一个基于元信息的属性约简算法,给出了一个综合性特征选择方法。该方法利用特征辨别能力进行特征初选以过滤掉一些词条来降低特征空间的稀疏性,使用所提属性约简算法消除冗余,从而获得较具代表性的特征子集。实验结果表明:所提特征选择方法在一定程度上具有一定的优势。

Abstract: Feature selection is one of the key steps in text categorization, the selected feature subset directly influences results of text categorization. The feature distinguish ability based on word frequency and document frequency is presented. Meta-information is introduced into rough sets and an attribute reduction algorithm based on meta-information is provided. A comprehensive feature selection method is proposed. The comprehensive method firstly uses the feature distinguish ability to select feature and filter out some terms to reduce the sparsity of feature spaces, and then employs the provided attribute reduction algorithm to eliminate redundancy, so that the more representative feature subset is acquired. The experimental results show that the comprehensive method in a certain extent has advantages.