Computer Engineering and Applications ›› 2018, Vol. 54 ›› Issue (3): 125-130.DOI: 10.3778/j.issn.1002-8331.1608-0326

Previous Articles     Next Articles

Feature selection algorithm based on multi-criterion ranking and C-SVM

SUN Qin1, 2, JIANG Yanhuang1, 2, HU Wei2, ZHANG Yi2, GAO Feng3   

  1. 1.State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha 410073, China
    2.College of Computer, National University of Defense Technology, Changsha 410073, China
    3.Unit 91550 of PLA, China
  • Online:2018-02-01 Published:2018-02-07

多准则赋权排序与C-SVM相结合的特征选择算法

孙  勤1,2,蒋艳凰1,2,胡  维2,张  毅2,高  峰3   

  1. 1.国防科学技术大学 高性能计算国家重点实验室,长沙 410073
    2.国防科学技术大学 计算机学院,长沙 410073
    3.中国人民解放军 91550部队

Abstract: Large-scaled and multi-dimension in data mining which may increase the storage of system, lead to the waste of time and low-accuracy precision. A new feature selection approach mCRC based on multi-criterion ranking and C-SVM is introduced in this paper towards the defects of present feature selection such as low accuracy and undeterminable amounts of optimal features. mCRC computes the dependencies between features and labels through mutual information and class distances, meanwhile deletes irrelevant and completely redundant attributes, and then all features based on dependencies and redundancies are ranked. For the sake of obtaining the optimal subset, it uses C-SVM classifiers to filtrate the sorted features via Sequential Forward Floating Selection(SFFS). Compared with traditional algorithm, mCRC algorithm achieves high accuracy with less training time, it’s able to provide a strong guarantee for quick and efficient massive data mining.

Key words: feature selection, maximum relevance minimum redundancy, multi-criterion ranking, C-SVM, Sequential Forward Floating Selection(SFFS)

摘要: 数据挖掘中所获取的数据维数多,常常导致数据存储所需容量大,知识挖掘所需时间长,预测正确率不高等问题,特征选择是解决上述问题的重要方法之一。针对现有特征选择算法最佳特征个数难以确定及分类准确率有待进一步提高等问题,提出一种同时考虑相关性和冗余度的多准则赋权排序的算法(mCRC),mCRC结合两种准则同时对特征进行排序,并利用C-SVM对按重要性降序排好的特征采用顺序前向浮动搜索得出最佳特征子集。实验结果表明,mCRC算法与单独基于互信息或类别可分性赋权排序的特征选择方法相比能在更短的时间内获得分类性能更好的最佳特征子集,为快速并高效地对数据集进行挖掘提供了有力保障。

关键词: 特征选择, 最大相关最小冗余, 多准则赋权排序, C-支持向量机, 顺序前向浮动搜索