Ensemble learning algorithm by consecutively removing training samples

Abstract

Abstract: Ensemble learning, which integrates multiple weak learners and produces a stronger learner, is one of the key research areas in machine learning. Although a number of algorithms have been proposed for the generation of base learners, these algorithms are usually with low robustness. This study proposes a novel ensemble learning algorithm, namely an Ensemble Learning Algorithm by Consecutively Removing Training Samples（ELACRTS）, which possesses the merits of both boosting and bagging methods. By removing the samples with high confidence from the training set, the training space is gradually reduced, which allows a sufficient learning on the underrepresented samples. The ELACRTS method generates a series of decreasing training subspaces and therefore produces a number of diverse base classifiers. Similar to boosting and bagging, voting is employed for integration of predictions by multiple base classifiers. It employs 10-folds cross validation to assess the performance of the proposed ELACRTS method. Extensive experiments on 8 datasets and 7 base classifiers demonstrate that the ELACRTS algorithm outperforms the boosting and bagging algorithms.

Key words: ensemble learning, base classifier, training subspace, decreasing, confidence level

摘要： 从多个弱分类器重构出强分类器的集成学习方法是机器学习领域的重要研究方向之一。尽管已有多种多样性基本分类器的生成方法被提出，但这些方法的鲁棒性仍有待提高。递减样本集成学习算法综合了目前最为流行的boosting与bagging算法的学习思想，通过不断移除训练集中置信度较高的样本，使训练集空间依次递减，使得某些被低估的样本在后续的分类器中得到充分训练。该策略形成一系列递减的训练子集，因而也生成一系列多样性的基本分类器。类似于boosting与bagging算法，递减样本集成学习方法采用投票策略对基本分类器进行整合。通过严格的十折叠交叉检验，在8个UCI数据集与7种基本分类器上的测试表明，递减样本集成学习算法总体上要优于boosting与bagging算法。

关键词: 集成学习, 基本分类器, 训练子空间, 递减, 置信度

ZHOU Yi, CHEN Ke, ZHU Bo, LIU Hao, WANG Yufan, WU Jigang, SUN Xuemei. Ensemble learning algorithm by consecutively removing training samples[J]. Computer Engineering and Applications, 2016, 52(12): 69-74.

周羿，陈科，朱波，刘浩，王宇凡，武继刚，孙学梅. 递减样本集成学习算法[J]. 计算机工程与应用, 2016, 52(12): 69-74.

[1]	WU Wenlong, ZHOU Xi, WANG Yi, WANG Baoquan. WKAG：Fraud Detection Method for Imbalanced Medical Insurance Data [J]. Computer Engineering and Applications, 2021, 57(9): 247-254.
[2]	WANG Qin, LIU Dun. Sequential Three-Way Sentiment Classification Combined with Ensemble Learning [J]. Computer Engineering and Applications, 2021, 57(23): 211-218.
[3]	XIONG Lin, TANG Wanmei. Incremental Learning Algorithm Based on Heterogeneous Classifier Ensemble [J]. Computer Engineering and Applications, 2020, 56(7): 155-161.
[4]	GU Zhaojun, WU You, ZHAO Chundi, ZHOU Jingxian. Resampling and Boosting Techniques for Balanced Traffic Classification [J]. Computer Engineering and Applications, 2020, 56(6): 86-91.
[5]	ZHAO Yuxin, Nurbol, AI Zhuang. Android Malware Detection Based on Ensemble Learning Voting Algorithm [J]. Computer Engineering and Applications, 2020, 56(22): 74-82.
[6]	GU Yanchun, LU Haiyan, XIANG Lei, SHEN Wanqiang. Adaptive Dynamic Learning Chicken Swarm Optimization Algorithm [J]. Computer Engineering and Applications, 2020, 56(20): 36-45.
[7]	WANG Dexue, LIN Yi, CHEN Junjie. Application of Cooperative Training Algorithm in Fault Diagnosis of Rolling Bearing [J]. Computer Engineering and Applications, 2020, 56(12): 273-278.
[8]	SU Jianmin, YANG Lanxin, JING Weipeng. U-Net Based Semantic Segmentation Method for High Resolution Remote Sensing Image [J]. Computer Engineering and Applications, 2019, 55(7): 207-213.
[9]	WANG Yang, WU Jianying, HUANG Jinlei, HU Hao, LIU Yuling. Network Intrusion Intention Recognition Method Based on Bayesian Attack Graph [J]. Computer Engineering and Applications, 2019, 55(22): 73-79.
[10]	LIU Shudong, ZHANG Ke. Research on Sampling Strategies in Class-Imbalanced Learning [J]. Computer Engineering and Applications, 2019, 55(21): 1-17.
[11]	LI Zhe, YU Mengru. Vehicle-Logo Recognition Based on Ensemble Learning with Multiple LBP Features [J]. Computer Engineering and Applications, 2019, 55(20): 134-138.
[12]	CHENG Yue, LI Jianzeng, LI Aihua, ZHU Lina. Weighted Feature Fusion Correlation Filter Tracking Based on Confidence Level [J]. Computer Engineering and Applications, 2019, 55(20): 152-158.
[13]	YU Enze, Nurbol, YU Qing. Phishing Website Detection Method Based on Integrated Learning [J]. Computer Engineering and Applications, 2019, 55(18): 81-88.
[14]	AN Chen, CHEN Yang. Dynamic link prediction method based on ensemble learning [J]. Computer Engineering and Applications, 2018, 54(6): 110-114.
[15]	ZHAI Xiyang, WANG Xiaodan, LI Rui, JIA Qi. Information entropy-based RVM-AdaBoost ensemble classifier [J]. Computer Engineering and Applications, 2018, 54(5): 138-143.

Ensemble learning algorithm by consecutively removing training samples

递减样本集成学习算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics