计算机工程与应用 ›› 2017, Vol. 53 ›› Issue (12): 70-75.DOI: 10.3778/j.issn.1002-8331.1605-0185

• 大数据与云计算 • 上一篇    下一篇

最小冗余最大分离准则特征选择方法

赖学方,贺兴时   

  1. 西安工程大学 理学院,西安 710048
  • 出版日期:2017-06-15 发布日期:2017-07-04

Method based on minimum redundancy and maximum separability for feature selection

LAI Xuefang, HE Xingshi   

  1. School of Science, Xi’an Polytechnic University, Xi’an 710048, China
  • Online:2017-06-15 Published:2017-07-04

摘要: 特征选择是处理高维数据的一项有效技术。针对传统方法的不足,结合[F-score]与互信息,提出了一种最小冗余最大分离的特征选择评价准则,该准则使所选择的特征具有更好的分类和预测能力;采用二进制布谷鸟搜索算法和二次规划两种搜索策略来搜索最优特征子集,并对两种搜索策略的准确性和计算量进行分析比较;最后,利用UCI数据集进行实验测试,实验结果说明了所提理论的有效性。

关键词: 高维数据, 费希尔得分, 搜索策略, 特征选择

Abstract: Feature selection is an effective technique for analyzing high-dimensional data. To improve the performance of traditional feature selection methods, a novel criterion function named minimum redundancy and maximum separability for feature selection is proposed by combining the F-score and mutual?information. Based on the new criterion function, the features select own a better ability for classification and prediction. Binary cuckoo search algorithm and quadratic programming algorithm are adopted to search the optimal subset of features, the accuracy and the?amount?of?computations for feature selection of these two search strategies are analyzed. Finally, the effectiveness of the proposed principle is verified by the experimental results though conducting tests on UCI datasets.

Key words: high-dimensional data, F-score, search strategy, feature selection