计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (4): 51-55.

• 大数据与云计算 • 上一篇    下一篇

改进SVM-KNN的不平衡数据分类

王超学1,张  涛1,马春森2   

  1. 1.西安建筑科技大学 信息与控制工程学院,西安 710055
    2.中国农业科学院 植物保护研究所,北京 100193
  • 出版日期:2016-02-15 发布日期:2016-02-03

Improved SVM-KNN algorithm for imbalanced datasets classification

WANG Chaoxue1, ZHANG Tao1, MA Chunsen2   

  1. 1.School of Information and Control Engineering, Xi’an University of Architecture and Technology, Xi’an 710055, China
    2.China Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing 100193, China
  • Online:2016-02-15 Published:2016-02-03

摘要: 针对支持向量机(SVM)在超平面附近进行不平衡数据(imbalanced datasets)分类的不准确性,提出了一种改进SVM-KNN算法,该算法在分类阶段计算测试样本与最优超平面的距离,如果距离差大于给定阈值可直接应用支持向量机分类;如果距离差小于给定阈值,则将所有支持向量都作为测试样本的近邻样本,进行KNN分类。通过对UCI数据集的大量实验表明,该算法在少数类样本的识别率和分类器的整体性能上有明显改善。

关键词: 支持向量机, K近邻法, 不平衡数据集

Abstract: Improved KNN-SVM that combined Support Vector Machine(SVM) with K Nearest Neighbor(KNN) is presented to improve the accuracy of imbalanced classification nearby SVM hyperplane. In the class phase, the algorithm computes the distance from the tested sample to the optimal super-plane of SVM in the feature space. If the distance is greater than the given threshold, the tested sample will be classified on SVM;otherwise the SVs from different categories are used as the tested sample of nearest neighbors, the tested sample will be classified on KNN. A large amount of experiments by the UCI dataset show that the algorithm can significantly improve the identification rate of the minority samples and overall classification performance.

Key words: Support Vector Machine(SVM), K Nearest Neighbor(KNN), imbalanced datasets