Computer Engineering and Applications ›› 2012, Vol. 48 ›› Issue (32): 140-146.

Previous Articles     Next Articles

Support vector machine study by combining feature selection and learn strategy

LV Pin1,2,3, ZHONG Luo1, CAI Dunbo2,3   

  1. 1.College of Computer Science and Technology, Wuhan University of Technology, Wuhan 430070, China
    2.School of Computer Science and Engineering, Institute of Wuhan Technology, Wuhan 430073, China
    3.Hubei Province Key Laboratory of Intelligent Robot, Wuhan Institute of Technology, Wuhan 430073, China
  • Online:2012-11-11 Published:2012-11-20

融合特征选取和学习策略的支持向量机研究

吕  品1,2,3,钟  珞1,蔡敦波2,3   

  1. 1.武汉理工大学 计算机科学与技术学院,武汉 430070
    2.武汉工程大学 计算机科学与工程学院,武汉 430073
    3.武汉工程大学 智能机器人湖北省重点实验室,武汉 430073

Abstract: Support Vector Machine(SVM) is one of the important machine learning methods and applied successfully to solve many classifying problems in real life. Aiming to improve the classification accuracy and training efficient of SVM, this paper reviews different feature selection algorithms and learning strategies before training SVM according to classification procedure. At the same time, this paper compares the classification accuracy of different feature selection method such as SFS, IWSS, IWSSr and BARS, and analyzes two performance measures on classification accuracy and precision/recall breakeven point when active learning strategy and SVM are combined to obtain a classifier. Experimental results indicate that the accuracy could be significantly improved and the number of training sample could be dramatically reduced by integrating the filtering method into the wrapper method;and when labeled training sample size is too small, active learning obtains better accuracy, however, if passive learning wants to have the same accuracy as active learning, passive learning must have the six times training samples than active learning.

Key words: support vector machine, feature selection, learning strategy, optimization methodologies, threshold

摘要: 支持向量机是重要的机器学习方法之一,已成功解决了许多实际的分类问题。围绕如何提高支持向量机的分类精度与训练效率,以分类过程为主线,主要综述了在训练支持向量机之前不同的特征选取方法与学习策略。在此基础上,比较了不同的特征选取方法SFS,IWSS,IWSSr以及BARS的分类精度,分析了主动学习策略与支持向量机融合后获得的分类器在测试集上的分类精度与正确率/召回率平衡点两个性能指标。实验结果表明,包装方法与过滤方法相结合的特征选取方法能有效提高支持向量机的分类精度和减少训练样本量;在标签数据较少的情况下,主动学习能达到更好的分类精度,而为了达到相同的分类精度,被动学习需要的样本数量必须要达到主动学习的6倍。

关键词: 支持向量机, 特征选取, 学习策略, 优化方法, 阈值