计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (16): 150-156.DOI: 10.3778/j.issn.1002-8331.1804-0218

• 模式识别与人工智能 • 上一篇    下一篇

基于Lévy分布的不平衡数据过采样方法

张扬帆,张海鹏,孙俊   

  1. 江南大学 物联网工程学院,江苏 无锡 214122
  • 出版日期:2019-08-15 发布日期:2019-08-13

Lévy-Based Oversampling Technique for Imbalanced Datasets

ZHANG Yangfan, ZHANG Haipeng, SUN Jun   

  1. School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2019-08-15 Published:2019-08-13

摘要: 针对不平衡数据集上的分类问题,提出了基于Lévy分布的过采样方法,其核心思想是根据初始数据集的分布,利用Lévy分布构造新样本的密度分布。基于Lévy分布的特性,使得从边界样本合成的新样本密度最大,靠近多数类的样本合成的新样本密度次之,靠近少数类的样本合成的新样本密度最小。因此,该算法可以增强分类边界,同时可以减小噪声生成。通过在多个数据集上的实验,表明所提算法可以有效改善不平衡数据的分类效果。

关键词: 不平衡分类, Lé, vy分布, 过采样, 人工合成过采样技术(SMOTE)

Abstract: For the classification problems on imbalanced datasets, a Lévy-based oversampling technique is proposed. Its essential idea is to employ Lévy distribution to construct the density distribution of synthetic samples according to the distribution of original datasets. Due to the properties of the Lévy distribution, the density of new samples synthetized from the borderlines is the largest, the density of new samples synthetized from the samples closer to the majority is the second one, and the density of new samples synthetized from the samples closer to the minority is the smallest. Thus, this approach can enhance the decision boundary and reduce the noise generation in the same time. Experiments on multiple datasets show that the proposed approach can effectively improve the classification results on imbalanced datasets.

Key words: imbalanced classification, Lévy distribution, oversampling, Synthetic Minority Oversampling Technique(SMOTE)