Computer Engineering and Applications ›› 2012, Vol. 48 ›› Issue (7): 128-131.
• 数据库、信号与信息处理 • Previous Articles Next Articles
WANG Xing, ZHANG Wenpeng
Received:
Revised:
Online:
Published:
王 兴,张文鹏
Abstract: Feature selection is one of the key steps in text categorization, the selected feature subset directly influences results of text categorization. The feature distinguish ability based on word frequency and document frequency is presented. Meta-information is introduced into rough sets and an attribute reduction algorithm based on meta-information is provided. A comprehensive feature selection method is proposed. The comprehensive method firstly uses the feature distinguish ability to select feature and filter out some terms to reduce the sparsity of feature spaces, and then employs the provided attribute reduction algorithm to eliminate redundancy, so that the more representative feature subset is acquired. The experimental results show that the comprehensive method in a certain extent has advantages.
摘要: 特征选择是文本分类的关键步骤之一,所选特征子集的优劣直接影响文本分类的结果。在分析词频方法和文档频方法不足的基础上提出了特征辨别能力,把元信息引入粗糙集并提出了一个基于元信息的属性约简算法,给出了一个综合性特征选择方法。该方法利用特征辨别能力进行特征初选以过滤掉一些词条来降低特征空间的稀疏性,使用所提属性约简算法消除冗余,从而获得较具代表性的特征子集。实验结果表明:所提特征选择方法在一定程度上具有一定的优势。
WANG Xing, ZHANG Wenpeng. Feature selection based on feature distinguish ability and meta-information[J]. Computer Engineering and Applications, 2012, 48(7): 128-131.
王 兴,张文鹏. 基于特征辨别能力和元信息的特征选择[J]. 计算机工程与应用, 2012, 48(7): 128-131.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/
http://cea.ceaj.org/EN/Y2012/V48/I7/128