基于聚类权重分阶段的SVM解不平衡数据集分类

计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (21): 133-137.

• 数据库、数据挖掘、机器学习 • 上一篇下一篇

基于聚类权重分阶段的SVM解不平衡数据集分类

王超学1，张涛1，马春森2

1.西安建筑科技大学信息与控制工程学院，西安 710055
2.中国农业科学院植物保护研究所，北京 100193

出版日期:2015-11-01 发布日期:2015-11-16

Resolution of classification for imbalanced dataset based on cluster-weight and grading-SVM algorithm

WANG Chaoxue1, ZHANG Tao1, MA Chunsen2

1.School of Information and Control Engineering, Xi’an University of Architecture and Technology, Xi’an 710055, China
2.China Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing 100193, China

Online:2015-11-01 Published:2015-11-16

摘要/Abstract

摘要： SVM在处理不平衡数据分类问题（class imbalance problem）时，其分类结果常倾向于多数类。为此，综合考虑类间不平衡和类内不平衡，提出一种基于聚类权重的分阶段支持向量机（WSVM）。预处理时，采用K均值算法得到多数类中各样本的权重。分类时，第一阶段根据权重选出多数类内各簇边界区域的与少数类数目相等的样本；第二阶段对选取的样本和少数类样本进行初始分类；第三阶段用多数类中未选取的样本对初始分类器进行优化调整，当满足停止条件时，得到最终分类器。通过对UCI数据集的大量实验表明，WSVM在少数类样本的识别率和分类器的整体性能上都优于传统分类算法。

关键词: 不平衡数据集, 权重分配模型, 支持向量机（SVM）

Abstract: Based on analyzing the shortages of SVM（Support Vector Machine） algorithm in solving classification problems on imbalanced dataset, a novel SVM approach based on cluster-weight technology and based-grading SVM classifier（short as WSVM） is presented in this paper that considers the uneven distribution of training sample between classes and within classes. The specific steps are as follows：when preprocessing, it uses K-means algorithm based on weight assignment model to obtain the weights of the majority samples. Classification is consisted of three phases. It selects the located in each cluster boundary majority samples, which is equal with the minority samples in quantity, then classifies the minority samples and selects samples, and adjusts the initial classifier through the unselected majority samples. When it comes to satisfy the explicit stopping criteria, the final classifier is got. A large amount of experiments by the UCI dataset show that WSVM can significantly improve the identification rate of the minority samples and overall classification performance.

Key words: imbalanced dataset, weight assignment model, Support Vector Machine（SVM）

王超学1，张涛1，马春森2. 基于聚类权重分阶段的SVM解不平衡数据集分类[J]. 计算机工程与应用, 2015, 51(21): 133-137.

WANG Chaoxue1, ZHANG Tao1, MA Chunsen2. Resolution of classification for imbalanced dataset based on cluster-weight and grading-SVM algorithm[J]. Computer Engineering and Applications, 2015, 51(21): 133-137.

[1]	韩卫宇，程龙生. 结合马田系统-SVM的滚动轴承故障模式分类研究[J]. 计算机工程与应用, 2021, 57(6): 239-246.
[2]	温杰彬，杨文忠，马国祥，张志豪，李海磊. 基于Apex帧光流和卷积自编码器的微表情识别[J]. 计算机工程与应用, 2021, 57(4): 127-133.
[3]	徐先峰，蔡路路，张丽. 融合MLP和DBN的光伏发电预测算法[J]. 计算机工程与应用, 2021, 57(3): 266-272.
[4]	王乐，韩萌，李小娟，张妮，程浩东. 不平衡数据集分类方法综述[J]. 计算机工程与应用, 2021, 57(22): 42-52.
[5]	陈富健，谢维信，夏婷. 基于LCT+的自适应抗遮挡目标跟踪算法[J]. 计算机工程与应用, 2021, 57(22): 190-198.
[6]	孟东霞，李玉鑑. 利用自然最近邻的不平衡数据过采样方法[J]. 计算机工程与应用, 2021, 57(2): 91-96.
[7]	陈菲雨，岳文斌，饶颖露，邢金昊，马晓静. 基于改进TLD算法的无人机自主精准降落[J]. 计算机工程与应用, 2020, 56(7): 247-254.
[8]	马玲，罗晓曙，蒋品群. 基于模板匹配和支持向量机的点阵字符识别研究[J]. 计算机工程与应用, 2020, 56(4): 134-139.
[9]	徐玲玲，迟冬祥. 面向不平衡数据集的机器学习分类策略[J]. 计算机工程与应用, 2020, 56(24): 12-27.
[10]	张忠林，冯宜邦，赵中恺. 一种基于SVM的非均衡数据集过采样方法[J]. 计算机工程与应用, 2020, 56(23): 220-228.
[11]	黄广俊，邓元龙. 融合改进LBP和SVM的偏光片外观缺陷检测与分类[J]. 计算机工程与应用, 2020, 56(22): 251-255.
[12]	隋修武，牛佳宝，李昊天，乔明敏. 基于NMF-SVM模型的上肢sEMG手势识别方法[J]. 计算机工程与应用, 2020, 56(17): 161-166.
[13]	孟东霞，李玉鑑. 融合特征边界信息的不平衡数据过采样方法[J]. 计算机工程与应用, 2020, 56(14): 156-160.
[14]	梁华刚，张志伟，王亚茹. 自适应Gabor卷积核编码网络的表情识别方法[J]. 计算机工程与应用, 2020, 56(10): 149-156.
[15]	晁静静，沈文忠，宋天舒. 基于HOG和SVM的双眼虹膜图像的人眼定位算法[J]. 计算机工程与应用, 2019, 55(9): 184-189.

基于聚类权重分阶段的SVM解不平衡数据集分类

Resolution of classification for imbalanced dataset based on cluster-weight and grading-SVM algorithm

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics