计算机工程与应用 ›› 2007, Vol. 43 ›› Issue (20): 210-213.

• 工程与应用 • 上一篇    下一篇

基于支持向量机的特征提取方法研究与应用

蒋 琳1,2,彭 黎2   

  1. 1.湖南商学院,长沙 410205
    2.湖南大学 软件学院,长沙 410082
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-07-11 发布日期:2007-07-11
  • 通讯作者: 蒋 琳

Study of feature selection method based on support vector machine and its application

JIANG Lin1,2,PENG Li2   

  1. 1.Hunan Business College,Changsha 410205,China
    2.Software School of Hunan University,Changsha 410082,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-07-11 Published:2007-07-11
  • Contact: JIANG Lin

摘要: 支持向量机是一种基于结构风险最小化原理的分类技术,已逐渐引起国内外研究者的关注。提出了一种用于最佳特征子集选取的特征筛选算法,且实现了特征与分类识别相关性强度的排序,并通过使用该算法对Ⅱ型糖尿病判别与风险因素筛选,求证了该方法的可靠性和可行性。当以该算法提取的特征子集{腰围、腰围/臀围、舒张血压、年龄}作为输入向量时,敏感度、特异性、准确率最高,分别为0.866 6、0.642 0、0.701 4。同时,还将该算法与主成分分析法进行比较。实验表明,在特征提取方面该算法优于主成分分析法。因此,该算法对分类识别、风险因素筛选是一种有效的方法,为解决该类问题探索了一条有效途径。

关键词: 支持向量机, 特征提取, 分类识别, Ⅱ型糖尿病

Abstract: Support Vector Machine(SVM),a kind of machine learning method,can efficiently solve the classification problem.A new classification-based feature selection algorithm is developed in this study.This algorithm is able to explore the best subset of features for classification from a group of either irrelevant or relevant features.Moreover,it can systematically prioritize all features based on degree of correlation between them and categories.And it finally is used to identify a set of combined-risk factors for type II diabetes in this study.A best subset of risk factors,consisting of waistline,waistline/hip-girth,diastolic blood pressure and age,is found for this disease.The sensitivity,specificity and accuracy of SVM classification under this subset are 0.866 6,0.642 0 and 0.701 4 respectively.In addition,a comparison between this algorithm and principal component analysis is also conducted.It turns out that the former is superior to the latter for the extraction of features.

Key words: SVM, feature selection, classification, typeⅡdiabetes