Imbalanced Data Classification Method Based on Boundary Mixed Resampling

doi:10.3778/j.issn.1002-8331.1901-0083

Abstract

Abstract: In the problem of imbalanced data classification, aiming to synthesize valuable new samples and delete the original samples without any influence, a novel imbalanced data classification method based on boundary mixed resampling is proposed. Firstly, the concept of k-outlier is introduced to find out the boundary and non-boundary samples and then deal with them in different ways. The minority samples in boundary are taken as the target points to synthesize new sample points while the non-boundary majority ones are under sampled based on distance to achieve a basic balance of samples. By comparing the experimental results, it shows that the proposed algorithm achieves a better classification performance on the classification accuracy of minority samples to some extent on the premise of ensuring a better G-mean value.

Key words: k-outlier, resampling, boundary points, imbalanced data classification

摘要： 在非平衡数据分类问题中，为了合成有价值的新样本和删除无影响的原样本，提出一种基于边界混合重采样的非平衡数据分类算法。该算法首先引入支持k-离群度概念，找出数据集中的边界点集和非边界点集；利用改进的SMOTE算法将少数类中的边界点作为目标样本合成新的点集，同时对多数类中的非边界点采用基于距离的欠采样算法，以此达到类之间的平衡。通过实验结果对比表明了该算法在保证G-mean值较优的前提下，一定程度上提高了少数类的分类精度。

关键词: 支持k-离群度, 重采样, 边界点, 非平衡数据分类

HOU Beibei, LIU Sanyang, PU Shiye. Imbalanced Data Classification Method Based on Boundary Mixed Resampling[J]. Computer Engineering and Applications, 2020, 56(1): 46-52.

侯贝贝，刘三阳，普事业. 基于边界混合重采样的非平衡数据分类方法[J]. 计算机工程与应用, 2020, 56(1): 46-52.

[1]	CHEN Junfeng, ZHENG Zhongtuan. Over-Sampling Method on Imbalanced Data Based on WKMeans and SMOTE [J]. Computer Engineering and Applications, 2021, 57(23): 106-112.
[2]	GU Zhaojun, WU You, ZHAO Chundi, ZHOU Jingxian. Resampling and Boosting Techniques for Balanced Traffic Classification [J]. Computer Engineering and Applications, 2020, 56(6): 86-91.
[3]	XU Lingling, CHI Dongxiang. Machine Learning Classification Strategy for Imbalanced Data Sets [J]. Computer Engineering and Applications, 2020, 56(24): 12-27.
[4]	WANG Xiaohua, NIE Tengteng. Research on Optimized Particle Filtering by Improved Cuckoo Algorithm [J]. Computer Engineering and Applications, 2020, 56(12): 60-65.
[5]	ZAN Meng’en, ZHOU Hang, HAN Dan, YANG Gang, XU Guoliang. Survey of Particle Filter Target Tracking Algorithms [J]. Computer Engineering and Applications, 2019, 55(5): 8-17.
[6]	LI Ying1, YANG Qiuxiang1, LEI Haiwei2, DU Bo2. Improved forward and reverse wraping depth image-based rendering [J]. Computer Engineering and Applications, 2018, 54(5): 186-190.
[7]	LIU Xuedong, ZHANG Kang, YANG Jie. Camera motion parameters estimation based on particle filter [J]. Computer Engineering and Applications, 2014, 50(7): 144-148.
[8]	LI Rui, MAO Li, ZHANG Jiurui. Particle filter resampling based on chaos immunity genetic optimization [J]. Computer Engineering and Applications, 2013, 49(6): 209-212.
[9]	WU Lei1, FANG Bin1, DIAO Liping2, CHEN Jing1, XIE Nana1. Imbalanced data resampling based on oversampling and under-sampling [J]. Computer Engineering and Applications, 2013, 49(21): 172-176.
[10]	XIE Nana, FANG Bin, WU Lei. Study of text categorization on imbalanced data [J]. Computer Engineering and Applications, 2013, 49(20): 118-121.
[11]	ZHANG Hang1, LI Mengli2, YANG Qingbo2. Particle filter algorithm based on niching genetic algorithm [J]. Computer Engineering and Applications, 2013, 49(18): 191-194.
[12]	YE Jinyin1, QIU Xumin1, HUANG Yong2, ZHANG Chunli1. Resampling interpolation methods of meteorological remote sensing image and grid point field [J]. Computer Engineering and Applications, 2013, 49(18): 237-241.
[13]	ZHU Chengwen1, LI Bing2, HU Kui3, PANG Kui2. Particle filters for HMM state inference [J]. Computer Engineering and Applications, 2012, 48(8): 161-163.
[14]	WU Gang1，2, TANG Zhenmin2, YANG Jingyu2. Resampling strategy imported by logarithm sampling in sequential Monte Carlo framework [J]. Computer Engineering and Applications, 2012, 48(6): 24-27.
[15]	XU Xiaowen1，ZENG Chao1，2，CUI Songye1，WANG Wei1. Research on resampling of ECG data from MIT-BIH database [J]. Computer Engineering and Applications, 2011, 47(8): 245-248.

Imbalanced Data Classification Method Based on Boundary Mixed Resampling

基于边界混合重采样的非平衡数据分类方法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics