Computer Engineering and Applications ›› 2007, Vol. 43 ›› Issue (20): 210-213.

• 工程与应用 • Previous Articles     Next Articles

Study of feature selection method based on support vector machine and its application

JIANG Lin1,2,PENG Li2   

  1. 1.Hunan Business College,Changsha 410205,China
    2.Software School of Hunan University,Changsha 410082,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-07-11 Published:2007-07-11
  • Contact: JIANG Lin


蒋 琳1,2,彭 黎2   

  1. 1.湖南商学院,长沙 410205
    2.湖南大学 软件学院,长沙 410082
  • 通讯作者: 蒋 琳

Abstract: Support Vector Machine(SVM),a kind of machine learning method,can efficiently solve the classification problem.A new classification-based feature selection algorithm is developed in this study.This algorithm is able to explore the best subset of features for classification from a group of either irrelevant or relevant features.Moreover,it can systematically prioritize all features based on degree of correlation between them and categories.And it finally is used to identify a set of combined-risk factors for type II diabetes in this study.A best subset of risk factors,consisting of waistline,waistline/hip-girth,diastolic blood pressure and age,is found for this disease.The sensitivity,specificity and accuracy of SVM classification under this subset are 0.866 6,0.642 0 and 0.701 4 respectively.In addition,a comparison between this algorithm and principal component analysis is also conducted.It turns out that the former is superior to the latter for the extraction of features.

Key words: SVM, feature selection, classification, typeⅡdiabetes

摘要: 支持向量机是一种基于结构风险最小化原理的分类技术,已逐渐引起国内外研究者的关注。提出了一种用于最佳特征子集选取的特征筛选算法,且实现了特征与分类识别相关性强度的排序,并通过使用该算法对Ⅱ型糖尿病判别与风险因素筛选,求证了该方法的可靠性和可行性。当以该算法提取的特征子集{腰围、腰围/臀围、舒张血压、年龄}作为输入向量时,敏感度、特异性、准确率最高,分别为0.866 6、0.642 0、0.701 4。同时,还将该算法与主成分分析法进行比较。实验表明,在特征提取方面该算法优于主成分分析法。因此,该算法对分类识别、风险因素筛选是一种有效的方法,为解决该类问题探索了一条有效途径。

关键词: 支持向量机, 特征提取, 分类识别, Ⅱ型糖尿病