New algorithm of AdaBoost for unbalanced datasets

Computer Engineering and Applications ›› 2011, Vol. 47 ›› Issue (28): 169-172.

• 图形、图像、模式识别 • Previous Articles Next Articles

New algorithm of AdaBoost for unbalanced datasets

WANG Canwei1，2，4，YU Zhilou3，ZHANG Huaxiang1

1.Department of Information Science and Engineering，Shandong Normal University，Jinan 250014，China
2.Department of Information and Engineering，Shandong Trade Union Cadre Institute，Jinan 250100，China
3.Inspur Group，Jinan 250101，China
4.Shandong Province Distributed Computer Software New Technique Key Laboratory，Jinan 250014，China

Received:1900-01-01 Revised:1900-01-01 Online:2011-10-01 Published:2011-10-01

一种适合不平衡数据集的新型提升算法

王灿伟1，2，4，于治楼3，张化祥1

1.山东师范大学信息科学与工程学院，济南 250014
2.山东工会管理干部学院信息工程学院，济南 250100
3.浪潮集团有限公司，济南 250101
4.山东省分布式计算机软件新技术重点实验室，济南 250014

Abstract

Abstract: A new training method of AdaBoost（ILAdaboost） which is good for unbalanced datasets is proposed in this paper.The algorithm evaluates the original data with the base classifier of each iteration.It divides the original dataset into four subsets，and then re-samples in the four subsets to form the balanced datasets，using for the base classifier learning in the next iteration.Due to the inclination to the minority and the false classified majority in the process of re-sampling，the decision surface in using synthetic classifier deviates from the minority.Based on the experiment of the 10 classical unbalanced datasets from UCI，the algorithm greatly increases the accuracy of minority and the GMA，keeping the accuracy of majority.

Key words: unbalanced dataset, ensemble learning, AdaBoost, re-sample

摘要： 提出了一种新的适用于不平衡数据集的Adaboost算法（ILAdaboost），该算法利用每一轮学习到的基分类器对原始数据集进行测试评估，并根据评估结果将原始数据集分成四个子集，然后在四个子集中重新采样形成平衡的数据集供下一轮基分类器学习，由于抽样过程中更加倾向于少数类和分错的多数类，故合成分类器的分界面会偏离少数类。该算法在UCI的10个典型不平衡数据集上进行实验，在保证多数类分类精度的同时提高了少数类的分类精度以及GMA。

关键词: 不平衡数据集, 集成学习, AdaBoost, 重采样

WANG Canwei1，2，4，YU Zhilou3，ZHANG Huaxiang1. New algorithm of AdaBoost for unbalanced datasets[J]. Computer Engineering and Applications, 2011, 47(28): 169-172.

王灿伟1，2，4，于治楼3，张化祥1. 一种适合不平衡数据集的新型提升算法[J]. 计算机工程与应用, 2011, 47(28): 169-172.

[1]	WU Wenlong, ZHOU Xi, WANG Yi, WANG Baoquan. WKAG：Fraud Detection Method for Imbalanced Medical Insurance Data [J]. Computer Engineering and Applications, 2021, 57(9): 247-254.
[2]	WANG Qin, LIU Dun. Sequential Three-Way Sentiment Classification Combined with Ensemble Learning [J]. Computer Engineering and Applications, 2021, 57(23): 211-218.
[3]	XIONG Lin, TANG Wanmei. Incremental Learning Algorithm Based on Heterogeneous Classifier Ensemble [J]. Computer Engineering and Applications, 2020, 56(7): 155-161.
[4]	GU Zhaojun, WU You, ZHAO Chundi, ZHOU Jingxian. Resampling and Boosting Techniques for Balanced Traffic Classification [J]. Computer Engineering and Applications, 2020, 56(6): 86-91.
[5]	ZHAO Yuxin, Nurbol, AI Zhuang. Android Malware Detection Based on Ensemble Learning Voting Algorithm [J]. Computer Engineering and Applications, 2020, 56(22): 74-82.
[6]	WANG Dexue, LIN Yi, CHEN Junjie. Application of Cooperative Training Algorithm in Fault Diagnosis of Rolling Bearing [J]. Computer Engineering and Applications, 2020, 56(12): 273-278.
[7]	GONG Jianfeng1, HAN Jiandong1, DENG Yifang2. Pedestrian Detection Algorithm Based on Motion Feature and Position Estimation [J]. Computer Engineering and Applications, 2019, 55(7): 138-144.
[8]	SU Jianmin, YANG Lanxin, JING Weipeng. U-Net Based Semantic Segmentation Method for High Resolution Remote Sensing Image [J]. Computer Engineering and Applications, 2019, 55(7): 207-213.
[9]	LIU Shudong, ZHANG Ke. Research on Sampling Strategies in Class-Imbalanced Learning [J]. Computer Engineering and Applications, 2019, 55(21): 1-17.
[10]	LI Zhe, YU Mengru. Vehicle-Logo Recognition Based on Ensemble Learning with Multiple LBP Features [J]. Computer Engineering and Applications, 2019, 55(20): 134-138.
[11]	ZHANG Kaibing, WANG Zhen, YAN Yadi, ZHU Danni. Optimized Regression-Based Image Super-Resolution Method via AdaBoost [J]. Computer Engineering and Applications, 2019, 55(20): 159-163.
[12]	GU Tingting, LIU Xinhui, SANG Qingbing, LI Chaofeng. No-Reference Image Quality Assessment Algorithm for Stereoscopic Images via Dual-Tree Complex Wavelet Transform [J]. Computer Engineering and Applications, 2019, 55(2): 154-161.
[13]	YU Enze, Nurbol, YU Qing. Phishing Website Detection Method Based on Integrated Learning [J]. Computer Engineering and Applications, 2019, 55(18): 81-88.
[14]	LI Ting, ZHANG Jingxiang. Adaptive boosting with central tendency algorithm for English essay scoring [J]. Computer Engineering and Applications, 2018, 54(9): 151-155.
[15]	CAO Wanpeng, LUO Yunbin, SHI Hui. Robust AdaBoost classifier construction method against outlier interference [J]. Computer Engineering and Applications, 2018, 54(7): 132-137.

New algorithm of AdaBoost for unbalanced datasets

一种适合不平衡数据集的新型提升算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics