Computer Engineering and Applications ›› 2010, Vol. 46 ›› Issue (20): 153-156.DOI: 10.3778/j.issn.1002-8331.2010.20.043

• 人工智能 • Previous Articles     Next Articles

Feature selection based on feature similarity measure

JIANG Sheng-yi1,WANG Lian-xi2   

  1. 1.School of Informatics,Guangdong University of Foreign Studies,Guangzhou 510006,China
    2.School of Management,Guangdong University of Foreign Studies,Guangzhou 510006,China
  • Received:2010-04-14 Revised:2010-05-20 Online:2010-07-11 Published:2010-07-11
  • Contact: JIANG Sheng-yi

基于特征相关性的特征选择

蒋盛益1,王连喜2   

  1. 1.广东外语外贸大学 信息学院,广州 510006
    2.广东外语外贸大学 国际工商管理学院,广州 510006
  • 通讯作者: 蒋盛益

Abstract: This paper proposes a feature selection algorithm based on feature similarity measure.The method clusters features based on similarity measure and then chooses representative features from each cluster.At last,the feature subset is selected by removing the feature which is less relevant or irrelevant to class feature.Theory analysis indicates that the method with lower time complexity can be applied in feature selection for high dimensional data.The superiority of the algorithm,in terms of dimensionality reduction and classification performance,is established extensively over UCI datasets through comparing with other classic feature selection approaches.

Key words: feature selection, similarity, feature clustering, classification

摘要: 提出了一种基于特征相关性的特征选择方法。该方法以特征之间相互依赖程度(相关度)为聚类依据先对特征进行聚类,再从各特征簇中挑选出具有代表性的特征,然后在被选择出来的特征中删除与目标特征无关或是弱相关的特征,最后留下的特征作为最终的特征子集。理论分析表明该方法的运算效率高,时间复杂度低,适合于大规模数据集中的特征选择。在UCI数据集上与文献中的经典方法进行实验比较和分析,结果显示提出的特征选择方法在特征约减和分类等方面具有更好的性能。

关键词: 特征选择, 相关度, 特征聚类, 分类

CLC Number: