Computer Engineering and Applications ›› 2019, Vol. 55 ›› Issue (17): 68-75.DOI: 10.3778/j.issn.1002-8331.1804-0307
Previous Articles Next Articles
ZHANG Ming, HU Xiaohui, WU Jiaxin
Online:
Published:
张明,胡晓辉,吴嘉昕
Abstract: Aiming to solve the poor performance of imbalanced datasets classification, a novel imbalanced datasets classification algorithm based on mixed sampling(BSI) is proposed. This method firstly introduces coefficient of variation to find out the sparse domain and dense domain samples, and then deals with them in different ways, an oversampling method(BSMOTE) is proposed to improve the SMOTE algorithm for the minority samples in sparse domain. An improved undersampling method(IS) is proposed for the majority samples in dense domain. Finally, experiments on six imbalanced datasets show that the algorithm achieves higher G-mean value, F-value value, AUC value, and improves the comprehensive performance of imbalanced datasets classification effectively.
Key words: imbalanced datasets, coefficient of variation, SMOTE algorithm, undersampling
摘要: 针对不平衡数据集分类效果不理想的问题,提出了一种新的基于混合采样的不平衡数据集算法(BSI)。通过引进“变异系数”找出样本的稀疏域和密集域,针对稀疏域中的少数类样本,提出了一种改进SMOTE算法的过采样方法(BSMOTE);对密集域中的多数类样本,提出了一种改进的欠采样方法(IS)。通过在六种不平衡数据集上的实验表明,该算法与传统算法相比,取得了更高的G-mean值、F-value值、AUC值,有效改善了不平衡数据集的综合分类性能。
关键词: 不平衡数据集, 变异系数, SMOTE算法, 欠采样
ZHANG Ming, HU Xiaohui, WU Jiaxin. Imbalanced Data Processing Algorithm Based on Mixed Sampling[J]. Computer Engineering and Applications, 2019, 55(17): 68-75.
张明,胡晓辉,吴嘉昕. 基于混合采样的不平衡数据集算法研究[J]. 计算机工程与应用, 2019, 55(17): 68-75.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.1804-0307
http://cea.ceaj.org/EN/Y2019/V55/I17/68