Oversampling Method for Unbalanced Data Sets Based on SVM

ZHANG Zhonglin, FENG Yibang, ZHAO Zhongkai   

  1. School of Electronics and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China
  1. 兰州交通大学 电子与信息工程学院,兰州 730070


Aiming at the problem that the classification results of imbalanced data sets are biased towards the majority class, resampling technology is one of the effective methods to solve this problem. However, traditional oversampling algorithms are easy to synthesize invalid samples, and undersampling methods are easy to eliminate important sample information. Based on this, an Oversampling Method based on SVM(SVMOM) is proposed. SVMOM synthesizes samples through iteration. In the iterative process, the classification hyperplane is first obtained by SVM. Secondly, the sample distance weight is assigned according to the distance of each minority sample to the classification hyperplane. While considering the intraclass balance of the minority sample, the sample density is calculated according to the distribution of the sample. It gives the sample density weight. Then it calculates the selection weight of each minority sample according to the distance weight and density weight of the sample, and finally it selects the sample according to the sample selection weight and uses SMOTE to synthesize a new sample to achieve the purpose of balancing the data set. The experimental results show that the algorithm proposed in this paper solves the problem that the classification results are biased towards the majority class to a certain extent, and verifies the effectiveness of the algorithm.

Key words: imbalanced data, Support Vector Machine(SVM), over-sampling, sample weight, Synthetic Minority Over-sampling Technique(SMOTE)


