改进SVM-KNN的不平衡数据分类

计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (4): 51-55.

改进SVM-KNN的不平衡数据分类

王超学1，张涛1，马春森2

1.西安建筑科技大学信息与控制工程学院，西安 710055
2.中国农业科学院植物保护研究所，北京 100193

出版日期:2016-02-15 发布日期:2016-02-03

Improved SVM-KNN algorithm for imbalanced datasets classification

WANG Chaoxue1, ZHANG Tao1, MA Chunsen2

1.School of Information and Control Engineering, Xi’an University of Architecture and Technology, Xi’an 710055, China
2.China Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing 100193, China

Online:2016-02-15 Published:2016-02-03

摘要/Abstract

摘要： 针对支持向量机（SVM）在超平面附近进行不平衡数据（imbalanced datasets）分类的不准确性，提出了一种改进SVM-KNN算法，该算法在分类阶段计算测试样本与最优超平面的距离，如果距离差大于给定阈值可直接应用支持向量机分类；如果距离差小于给定阈值，则将所有支持向量都作为测试样本的近邻样本，进行KNN分类。通过对UCI数据集的大量实验表明，该算法在少数类样本的识别率和分类器的整体性能上有明显改善。

关键词: 支持向量机, K近邻法, 不平衡数据集

Abstract: Improved KNN-SVM that combined Support Vector Machine（SVM） with K Nearest Neighbor（KNN） is presented to improve the accuracy of imbalanced classification nearby SVM hyperplane. In the class phase, the algorithm computes the distance from the tested sample to the optimal super-plane of SVM in the feature space. If the distance is greater than the given threshold, the tested sample will be classified on SVM；otherwise the SVs from different categories are used as the tested sample of nearest neighbors, the tested sample will be classified on KNN. A large amount of experiments by the UCI dataset show that the algorithm can significantly improve the identification rate of the minority samples and overall classification performance.

Key words: Support Vector Machine（SVM）, K Nearest Neighbor（KNN）, imbalanced datasets

王超学1，张涛1，马春森2. 改进SVM-KNN的不平衡数据分类[J]. 计算机工程与应用, 2016, 52(4): 51-55.

WANG Chaoxue1, ZHANG Tao1, MA Chunsen2. Improved SVM-KNN algorithm for imbalanced datasets classification[J]. Computer Engineering and Applications, 2016, 52(4): 51-55.

[1]	高一锴，彭力，徐龙壮. 改进AFSA算法优化TWSVM的火焰识别方法[J]. 计算机工程与应用, 2021, 57(8): 204-213.
[2]	韩卫宇，程龙生. 结合马田系统-SVM的滚动轴承故障模式分类研究[J]. 计算机工程与应用, 2021, 57(6): 239-246.
[3]	雷恒林，古兰拜尔·吐尔洪，买日旦·吾守尔，张东梅. 新奇检测综述[J]. 计算机工程与应用, 2021, 57(5): 47-55.
[4]	温杰彬，杨文忠，马国祥，张志豪，李海磊. 基于Apex帧光流和卷积自编码器的微表情识别[J]. 计算机工程与应用, 2021, 57(4): 127-133.
[5]	李俊侠，张秦，郑桂妹. 超宽带雷达人体姿态识别综述[J]. 计算机工程与应用, 2021, 57(3): 14-23.
[6]	徐先峰，蔡路路，张丽. 融合MLP和DBN的光伏发电预测算法[J]. 计算机工程与应用, 2021, 57(3): 266-272.
[7]	王乐，韩萌，李小娟，张妮，程浩东. 不平衡数据集分类方法综述[J]. 计算机工程与应用, 2021, 57(22): 42-52.
[8]	陈富健，谢维信，夏婷. 基于LCT+的自适应抗遮挡目标跟踪算法[J]. 计算机工程与应用, 2021, 57(22): 190-198.
[9]	杨泉. N1+N2结构语法关系判定的SVM算法[J]. 计算机工程与应用, 2021, 57(20): 104-108.
[10]	孟东霞，李玉鑑. 利用自然最近邻的不平衡数据过采样方法[J]. 计算机工程与应用, 2021, 57(2): 91-96.
[11]	高晋，赵云芃，Godfred Kim Mensah，李欣芸，刘志芬，陈俊杰，郭浩. 静息态功能脑连接的空间动态分析及分类研究[J]. 计算机工程与应用, 2021, 57(2): 150-155.
[12]	秦博宇，郝晓燕，刘永芳. 基于SVM和CRF双层模型的FrameNet框架消歧[J]. 计算机工程与应用, 2021, 57(18): 255-262.
[13]	郑淋文，周金治，黄静. 深度稀疏自编码器在ECG特征提取中的应用[J]. 计算机工程与应用, 2021, 57(11): 156-161.
[14]	温廷新，孔祥博. 不平衡样本下的金融市场极端风险预警研究[J]. 计算机工程与应用, 2020, 56(8): 256-260.
[15]	陈菲雨，岳文斌，饶颖露，邢金昊，马晓静. 基于改进TLD算法的无人机自主精准降落[J]. 计算机工程与应用, 2020, 56(7): 247-254.