计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (17): 55-62.DOI: 10.3778/j.issn.1002-8331.1808-0410

• 理论与研发 • 上一篇    下一篇

L-SMOTE与SVM结合的不平衡数据集分类研究

罗康洋,王国强   

  1. 1.海工程技术大学 管理学院,上海 201620
    2.海工程技术大学 数理与统计学院,上海 201620
  • 出版日期:2019-09-01 发布日期:2019-08-30

Research on Imbalanced Data Classification Based on L-SMOTE and SVM

LUO Kangyang, WANG Guoqiang   

  1. 1.School of Management, Shanghai University of Engineering Science, Shanghai 201620, China
    2.College of Mathematics and Statistics, Shanghai University of Engineering Science, Shanghai 201620, China
  • Online:2019-09-01 Published:2019-08-30

摘要: 针对不平衡数据集的低分类效率,基于L-SMOTE算法和混合核SVM提出了一种改进的SMOTE算法(FTL-SMOTE)。利用混合核SVM对数据集进行分类。提出了噪声样本识别三原则对噪声样本进行精确识别并予以剔除,进而利用F-SMOTE和T-SMOTE算法分别对错分和正确分类的少类样本进行采样。如此循环,直到满足终止条件,算法结束。通过在UCI数据集上与经典的SMOTE等重要采样算法以及标准SVM的大量实验表明,该方法具有更好的分类效果,改进算法与L-SMOTE算法相比,运算时间大幅减少。

关键词: 不平衡数据集, 分类, 结合少数过采样技术(SMOTE), 混合核函数, 支持向量机

Abstract: In view of the low classification effectiveness of the imbalanced datasets, this paper gives an improved SMOTE(FTL-SMOTE) based on L-SMOTE and SVM with mixtures kernels. Firstly, the classification is carried on using SVM with mixtures kernel function. Secondly, this paper presents the three principles of noise samples recognition for identifying precisely the noise samples and deleting these samples, and the sampling to the minority class samples is wrongly and correctly classified that using the method of F-SMOTE and T-SMOTE algorithm. Looping the above process until the termination condition is satisfied. The extensive experiments are conducted to compare classic SMOTE and important relevantly algorithms on the UCI dataset, and the experimental results show that the method given in this paper has preferable classifying quality, and improved algorithm reduces the operating time compared with L-SMOTE.

Key words: imbalanced dataset, classification, Synthetic Minority Over-sampling Technique(SMOTE), mixed kernel function, Support Vector Machine(SVM)