计算机工程与应用 ›› 2009, Vol. 45 ›› Issue (35): 6-9.DOI: 10.3778/j.issn.1002-8331.2009.35.003

• 博士论坛 • 上一篇    下一篇

基于粗糙集和灰色关联度的综合性特征选择

朱颢东1,2,钟 勇1,2   

  1. 1.中国科学院 成都计算机应用研究所,成都 610041
    2.中国科学院 研究生院,北京 100039
  • 收稿日期:2009-09-03 修回日期:2009-10-13 出版日期:2009-12-11 发布日期:2009-12-11
  • 通讯作者: 朱颢东

Syntaxic feature selection based on rough sets and gray correlation

ZHU Hao-dong1,2,ZHONG Yong1,2   

  1. 1.Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu 610041,China
    2.The Graduate School of the Chinese Academy of Sciences,Beijing 100039,China
  • Received:2009-09-03 Revised:2009-10-13 Online:2009-12-11 Published:2009-12-11
  • Contact: ZHU Hao-dong

摘要: 在文本特征空间中,特征维数通常高达几万,这大大限制了分类算法的选择,降低了分类算法的性能,影响了分类器的设计,为此需要进行特征选择以避免“维数灾难”。提出了一个综合性的特征选择方法,该方法首先利用一个优化的文档频进行特征初选以过滤掉一些词条来降低特征空间的稀疏性,然后利用一个基于粗糙集和灰色关联度的属性约简算法来消除冗余,从而获得较具代表性的特征子集。实验结果表明该综合性方法效果良好。

Abstract: In text feature spaces,feature dimensions are usually 10,000 and more,which restrict choice of classification algorithms and reduce performance of classification algorithms,also make classifiers hardly design,so feature selection is necessary to avoid curse of dimensionality.A syntaxic feature selection method is presented.The method firstly uses an optimized document frequency to select feature to filter out some terms to reduce the sparsity of feature spaces,and then employs an attribute reduction algorithm based on rough set and gray correlation to eliminate redundancy,so can acquire the feature subset which are more representative.The experimental results show that the syntaxic method is effective.

中图分类号: