
计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (2): 59-72.DOI: 10.3778/j.issn.1002-8331.2407-0160
朱诗能,韩萌,杨书蓉,代震龙,杨文艳,丁剑
出版日期:2025-01-15
发布日期:2025-01-15
ZHU Shineng, HAN Meng, YANG Shurong, DAI Zhenlong, YANG Wenyan, DING Jian
Online:2025-01-15
Published:2025-01-15
摘要: 现实世界的场景中,从数据流中学习会面临着类不平衡的问题,学习算法由于缺少训练数据而无法有效识别少数类样本。为了介绍不平衡数据流集成分类的研究现状和面临的挑战,依据近年来的不平衡数据流集成分类领域文献,从基于加权、选择和投票的决策规则和基于代价敏感学习、主动学习和增量学习的学习方式的角度详细分析和总结了不平衡数据流的集成方法,并比较了使用相同数据集的算法的性能。针对处理不同类型复杂数据流中的不平问题,从概念漂移、多类、噪声和类重叠四个方面对其集成分类算法进行总结,分析了经典算法的时间复杂度。对动态数据流、缺失信息的数据流、多标签数据流和不确定数据流中不平衡问题的分类挑战提出了下一步的集成策略研究。
朱诗能, 韩萌, 杨书蓉, 代震龙, 杨文艳, 丁剑. 不平衡数据流的集成分类方法综述[J]. 计算机工程与应用, 2025, 61(2): 59-72.
ZHU Shineng, HAN Meng, YANG Shurong, DAI Zhenlong, YANG Wenyan, DING Jian. Ensemble Classification Methods for Imbalanced Data Streams[J]. Computer Engineering and Applications, 2025, 61(2): 59-72.
| [1] GUANG Y, XIA W, JING Z. A dynamic balanced quadtree for real-time streaming data[J]. Knowledge-Based Systems, 2023, 263: 110291. [2] PENG X Y, WANG F Y, LI L. MixGradient: a gradient-based re-weighting scheme with mixup for imbalanced data streams[J]. Neural Networks, 2023, 161: 525-534. [3] PAWE? K. The prior probability in the batch classification of imbalanced data streams[J]. Neurocomputing, 2021, 452: 309-316. [4] QIN J M, WANG C, ZOU Q H, et al. Active learning with extreme learning machine for online imbalanced multiclass classification[J]. Knowledge-Based Systems, 2021, 231: 107385. [5] CHEN Z, SHENG V, EDWARDS A, et al. Cost-sensitive sparse group online learning for imbalanced data streams[J]. Machine Learning, 2023, 113(7): 4407-4444. [6] CHEN Z, SHENG V, EDWARDS A, et al. An effective cost-sensitive sparse online learning framework for imbalanced streaming data classification and its application to online anomaly detection[J]. Knowledge and Information Systems, 2022, 65(1): 59-87. [7] LIANG D C, YI B C. Two-stage three-way enhanced technique for ensemble learning in inclusive policy text classification[J]. Information Sciences, 2021, 547: 271-288. [8] CHEN Y Y, YANG X W, DAI H L. Cost-sensitive continuous ensemble kernel learning for imbalanced data streams with concept drift[J]. Knowledge-Based Systems, 2024, 284: 111272. [9] KLIKOWSKI J, WO?NIAK M. Employing one-class SVM classifier ensemble for imbalanced data stream classification[C]//Proceedings of the International Conference on Computational Science, 2020: 117-127. [10] ZHENG X L, LI P P, WU X D. Data stream classification based on extreme learning machine: a review[J]. Big Data Research, 2022, 30: 100356. [11] AGUIAR G, KRAWCZYK B, CANO A. A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework[J]. Machine Learning, 2023, 113: 4165-4243. [12] 刘允峰, 佟季萱, 叶应图. 动态数据流集成分类算法综述[J]. 渤海大学学报 (自然科学版), 2023, 44(1): 79-91. LIU Y F, TONG J X, YE Y T. Overview of ensemble classification algorithms for dynamic data streams[J]. Journal of Bohai University (Natural Science Edition), 2023, 44(1): 79-91. [13] KRAWCZYK B, MINKU L, GAMA J, et al. Ensemble learning for data stream analysis: a survey[J]. Information Fusion, 2017, 37: 132-156. [14] NOURI Z, KIANI V, FADISHEI H. Rarity updated ensemble with oversampling: an ensemble approach to classification of imbalanced data streams[J]. Statistical Analysis and Data Mining: The ASA Data Science Journal, 2024, 17(1): 11662. [15] IRENEUSZ C. Weighted ensemble with one-class classification and over-sampling and instance selection (WECOI): an approach for learning from imbalanced data streams[J]. Journal of Computational Science, 2022, 61: 101614. [16] KLIKOWSKI J, WO?NIAK M. Multi sampling random subspace ensemble for imbalanced data stream classification[J]. Progress in Computer Recognition Systems, 2020, 977: 360-369. [17] WEGIER W, KSIENIEWICZ P. Application of imbalanced data classification quality metrics as weighting methods of the ensemble data stream classification algorithms[J]. Entropy, 2022, 22(8): 849. [18] ZHANG Y, DU H L, KE G, et al. Dynamic weighted selective ensemble learning algorithm for imbalanced data streams[J]. The Journal of Supercomputing, 2021, 78(4): 5394-5419. [19] MALIALIS K, PANAYIOTOU C, POLYCARPOU M. Nonstationary data stream classification with online active learning and siamese neural networks[J]. Neurocomputing, 2022, 512: 235-252. [20] 梁斌, 李光辉, 代成龙. 面向概念漂移且不平衡数据流的G-mean加权分类方法[J]. 计算机研究与发展, 2022, 59(12): 2844-2857. LIANG B, LI G H, DAI C L. G-mean weighted classification method for concept drifting and imbalanced data streams[J]. Journal of Computer Research and Development, 2022, 59(12): 2844-2857. [21] 董明刚, 张伟, 敬超. 面向不平衡数据流的动态权重集成分类算法[J]. 小型微型计算机系统, 2020, 41(8): 1649-1655. DONG M G, ZHANG W, JING C. Dynamic weight ensemble integration classification algorithm for imbalanced data stream based on sampling[J]. Journal of Chinese Computer Systems, 2020, 41(8): 1649-1655. [22] GRZYB J, KLIKOWSKI J, WO?NIAK M. Hellinger distance weighted ensemble for imbalanced data stream classification[J]. Journal of Computational Science, 2021, 51: 101314. [23] JIAO B T, GUO Y N, GONG D W, et al. Dynamic ensemble selection for imbalanced data streams with concept drift[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(1) : 1278-1291. [24] 赵强利, 蒋艳凰. 类别严重不均衡应用的在线数据流学习算法[J]. 计算机科学, 2017, 44(6): 255-259. ZHAO Q L, JIANG Y H. Online data stream mining for seriously unbalanced applica-tions[J]. Computer Science, 2017, 44(6): 255-259. [25] HAN M, ZHANG X L, CHEN Z Q, et al. Dynamic ensemble selection classification algorithm based on window over imbalanced drift data stream[J]. Knowledge and Information Systems, 2022, 65(3): 1105-1128. [26] 张喜龙, 韩萌, 陈志强, 等. 动态集成选择的不平衡漂移数据流Boosting分类算法[J]. 山东大学学报 (工学版), 2023, 53(4): 83-92. ZHANG X L, HAN M, CHEN Z Q, et al. Boosting classification algorithm for imbalanced drift data stream based on dynamic ensemble selection[J]. Journal of Shandong University (Engineering Science), 2023, 53(4): 83-92. [27] PRIYA S, HARIBHARATHI S, VIJAY A R. Imbalanced data stream classification using dynamic ensemble selection[C]//Proceedings of the International Conference on Electrical, Computer, Communications and Mechatronics Engineering, 2023: 1-5. [28] REN S Q, ZHU W, LIAO B, et al. Selection-based resampling ensemble algorithm for nonstationary imbalanced stream data learning[J]. Knowledge-Based Systems, 2019, 163: 705-722. [29] ZYBLEWSKI P, KSIENIEWICZ P, WO?NIAK M. Classifier selection for highly imbalanced data streams with minority driven ensemble[J]. Artificial Intelligence and Soft Computing, 2019, 11508: 626-635. [30] BERNARDO A, VALLE E D. SMOTE-OB: combining SMOTE and online bagging for continuous rebalancing of evolving data streams[C]//Proceedings of the 2021 IEEE International Conference on Big Data, 2021: 5033-5042. [31] BILAL M, LIN Z P, CAO J W, et al. Voting based weighted online sequential extreme learning machine for imbalance multi-class classification[C]//Proceedings of the 2015 IEEE International Symposium on Circuits and Systems, 2015: 565-568. [32] WANG S, MINKU L L, YAO X. Resampling-based ensemble methods for online class imbalance learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(5): 1356-1368. [33] SUN Y G, WANG Z H, LI H T, et al. A novel ensemble classification for data streams with class imbalance and concept drift[J]. International Journal of Performability Engineering, 2017, 13(6): 945-955. [34] SADEGHI F, VIKTOR H L, VAFAIE P. DynaQ: online learning from imbalanced multi-class streams through dynamic sampling[J]. Applied Intelligence, 2023, 53(21): 24908-24930. [35] DU H L, ZHANG Y, GANG K, et al. Online ensemble learning algorithm for imbalanced data stream[J]. Applied Soft Computing, 2021, 107: 107378. [36] SUN Y G, LI M, LI L, et al. Cost-sensitive classification for evolving data streams with concept drift and class imbalance[J]. Computational Intelligence and Neuroscience, 2021, 2021(1): 8813806. [37] SUN Y G, SUN Y, DAI H H. Two-stage cost-sensitive learning for data streams with concept drift and class imbalance[J]. IEEE Access, 2020, 8: 191942-191955. [38] 孙艳歌, 邵罕, 杨艳聪. 基于代价敏感不平衡数据流分类算法[J]. 信阳师范学院学报 (自然科学版), 2019, 32(4): 670-674. SUN Y G, SHAO H, YANG Y C. Classification for imbalanced data streams based on cost-sensitive[J]. Journal of Xinyang Normal University (Natural Science Edition), 2019, 32(4): 670-674. [39] PEPSI M , KUMAR N. Hybrid firefly optimised ensemble classification for drifting data streams with imbalance[J]. Knowledge-Based Systems, 2024, 288: 111500. [40] WANG L W, YAN Y C, GUO W. Ensemble online weighted sequential extreme learning machine for class imbalanced data streams[C]//Proceedings of the 2021 2nd International Symposium on Computer Engineering and Intelligent Communications, 2021: 81-86. [41] ZHANG W B, WANG J W. A hybrid learning framework for imbalanced stream classification[C]//Proceedings of the IEEE International Congress on Big Data, 2017: 480-487. [42] 李艳红, 任霖, 王素格, 等. 非平衡数据流在线主动学习方法[J]. 自动化学报, 2024, 50(7): 1389-1401. LI Y H, REN L, WANG S G, et al. Online active learning method for imbalanced data stream[J]. Acta Automatica Sinica, 2024, 50(7): 1389-1401. [43] HALDER B, HASAN A K M, AMAGASA T, et al. Autonomic active learning strategy using cluster-based ensemble classifier for concept drifts in imbalanced data stream[J]. Expert Systems with Applications, 2023, 231: 120578. [44] ZHANG H, LIU W K, SHAN J C, et al. Online active learning paired ensemble for concept drift and class imbalance[J]. IEEE Access, 2018, 6: 73815-73828. [45] KORYCKI ?, CANO A, KRAWCZYK B. Active learning with abstaining classifiers for imbalanced drifting data streams[C]//Proceedings of the IEEE International Conference on Big Data, 2019: 2334-2343. [46] ZHANG H, LIU W K, LIU Q B. Reinforcement online active learning ensemble for drifting imbalanced data streams[J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(8): 3971-3983. [47] HALDER B, HASAN K M A. A hybrid labeling strategy for imbalanced data stream in presence of concept drifts[C]//Proceedings of the 2022 25th International Conference on Computer and Information Technology, 2022: 797-802. [48] LI Z, HUANG W C, XIONG Y, et al. Incremental learning imbalanced data streams with concept drift: the dynamic updated ensemble algorithm[J]. Knowledge-Based Systems, 2020, 195: 105694. [49] JIANG J, LIU F, WING W, et al. Dynamic incremental ensemble fuzzy classifier for data streams in green internet of things[J]. IEEE Transactions on Green Communications and Networking, 2022, 6(3): 1316-1329. [50] NG W W Y, ZHANG J J, LAI C S, et al. Cost-sensitive weighting and imbalance-reversed bagging for streaming imbalanced and concept drifting in electricity pricing classification[J]. IEEE Transactions on Industrial Informatics, 2019, 15(3): 1588-1597. [51] LU Y, CHEUNG Y M, TANG Y Y. Dynamic weighted majority for incremental learning of imbalanced data streams with concept drift[C]//Proceedings of the International Joint Conference on Artificial Intelligence, 2017: 2393-2399. [52] DITZLER G, POLIKAR R. Incremental learning of concept drift from streaming imbalanced data[J]. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(10): 2283-2301. [53] LU Y , CHEUNG Y M, TANG Y Y. Adaptive chunk-based dynamic weighted majority for imbalanced data streams with concept drift[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(8): 2764-2778. [54] KARIMIAN M, BEIGY H. Concept drift handling: a domain adaptation perspective[J]. Expert Systems with Applications, 2023, 224: 119946. [55] LIU Y S, WANG S, SUI H, et al. An ensemble learning method with GAN-based sampling and consistency check for anomaly detection of imbalanced data streams with concept drift[J]. PLOS ONE, 2024, 19(1): 0292140. [56] GUO Y N, PU J Y, JIAO B T, et al. Online semi-supervised rithm for concept drift and class imbalance data streams[J]. Acta Electronica Sinica, 2022, 50(3): 585-597. [58] WANG Z X, SUN G, ZHAO J, et al. An ensemble classification algorithm for imbalanced data streams with unlabeled data[C]//Proceedings of the International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery, 2021: 1203-1210. [59] JUNAID K, PAULRAJ D, SETHUKARASI T. A comprehensive ensemble classification techniques detecting and managing concept drift in dynamic imbalanced data streams[J]. Wireless Networks, 2024: 269355237. [60] ANCY S, PAULRAJ D. Handling imbalanced data with concept drift by applying dynamic sampling and ensemble classification model[J]. Computer Communications, 2020, 153: 553-560. [61] 袁磊, 季梦遥. 概念漂移不平衡数据流随机平衡采样分类算法[J]. 湖北大学学报(自然科学版), 2019, 41(1): 95-100. YUAN L, JI M Y. Mining concept-drifting imbalanced streams using random sampling algorithm[J]. Journal of Hubei University(Natural Science), 2019, 41(1): 95-100. [62] LUKASZ K, BARTOSZ K. Concept drift detection from multi-class imbalanced data streams[C]//Proceedings of the IEEE 37th International Conference on Data Engineering, 2021: 1068-1079. [63] LIU W K, ZHANG H, DING Z Y, et al. A comprehensive active learning method for multiclass imbalanced data streams with concept drift[J]. Knowledge-Based Systems, 2021, 215: 106778. [64] KHANDEKAR V S, SHRINATH P. Hybrid dynamic chunk ensemble model for multi-class data streams[J]. Indonesian Journal of Electrical Engineering and Computer Science, 2022, 25(2): 1115. [65] MADKOUR A, ABDELKADER H, MOHAMMED A. Dynamic classification ensembles for handling imbalanced multiclass drifted data streams[J]. Information Sciences, 2024, 670: 120555. [66] VAFAIE P, VIKTOR H, MICHALOWSKI W. Multi-class imbalanced semi-supervised learning from streams through online ensembles[C]//Proceedings of the International Conference on Data Mining Workshops, 2020: 867-874. [67] LI A, HAN M, MU D L, et al. Online active learning method for multi-class imbalanced data stream[J]. Knowledge and Information Systems, 2023, 66: 2355-2391. [68] WANG P F, JIN N L, WOO W L, et al. Noise tolerant drift detection method for data stream mining[J]. Information Sciences, 2022, 609: 1318-1333. [69] CANO A, KRAWCZYK B. ROSE: robust online self-adjusting ensemble for continual learning on imbalanced drifting data streams[J]. Machine Learning, 2022, 111(7): 2561-2599. [70] ZYBLEWSKI P, SABOURIN R, WO?NIAK M. Preprocessed dynamic classifier ensemble selection for highly imbalanced drifted data streams[J]. Information Fusion, 2021, 66: 138-154. [71] CANO A, KRAWCZYK B. Kappa updated ensemble for drifting data stream mining[J]. Machine Learning, 2019, 109(1): 175-218. [72] KLIKOWSKI J, WO?NIAK M. Deterministic sampling classifier with weighted bagging for drifted imbalanced data stream classification[J]. Applied Soft Computing, 2022, 122: 108855. [73] PAN S R, WU J, ZHU X Q, et al. Graph ensemble boosting for imbalanced noisy graph stream classification[J]. IEEE Transactions on Cybernetics, 2015, 45(5): 954-968. [74] SANTOS M S, ABREU P H, JAPKOWICZ N, et al. On the joint-effect of class imbalance and overlap: a critical review[J]. Artificial Intelligence Review, 2022, 55(8): 6207-6275. [75] USMAN M, CHEN H H. Pro-IDD: Pareto-based ensemble for imbalanced and drifting data streams[J]. Knowledge-Based Systems, 2023, 282: 111103. [76] REN S Q, LIAO B, ZHU W, et al. The gradual resampling ensemble for mining imbalanced data streams with concept drift[J]. Neurocomputing, 2018, 286: 150-166. |
| [1] | 任艳平, 郑 重, 江一飞, 严远亭, 张燕平. 融合后验概率和密度的不平衡数据欠采样方法[J]. 计算机工程与应用, 2022, 58(23): 268-277. |
| [2] | 王方,张雪英,胡风云,李凤莲. 集成分类器对脑卒中患者脑电的分类[J]. 计算机工程与应用, 2021, 57(24): 276-282. |
| [3] | 汪良楠,肖 迪. 基于CCS优化的FDT集成分类算法研究[J]. 计算机工程与应用, 2018, 54(5): 127-131. |
| [4] | 海宇娇,刘青昆. 基于差分进化的ELM加权集成分类[J]. 计算机工程与应用, 2017, 53(8): 57-60. |
| [5] | 薛昆南,薛月菊,毛 亮,刘洪山. 基于卷积词袋网络的视觉识别[J]. 计算机工程与应用, 2016, 52(21): 180-187. |
| [6] | 尹绍宏,张盼盼. 一种基于概念重复性的数据流集成分类算法[J]. 计算机工程与应用, 2016, 52(12): 80-84. |
| [7] | 汪 凌. 不完备决策系统规则获取的相容矩阵算法[J]. 计算机工程与应用, 2015, 51(1): 130-133. |
| [8] | 付 捷,刘建伟,李双成,罗雄麟. 基于bregman距离和等式约束正则化AdaBoost算法[J]. 计算机工程与应用, 2013, 49(3): 166-170. |
| [9] | 闫 林1,闫 硕2. 粒计算下的决策系统分解与决策转换[J]. 计算机工程与应用, 2012, 48(4): 13-17. |
| [10] | 张宝华. 决策规则分类器在网络入侵检测中的应用[J]. 计算机工程与应用, 2012, 48(26): 93-95. |
| [11] | 高建山1,2,鲁士文3. 图书馆网站分类评价方法研究——基于优势粗糙集理论的分析[J]. 计算机工程与应用, 2011, 47(8): 47-50. |
| [12] | 杜 蕾,管延勇,杨 芳. 优势关系下模糊目标信息系统的决策规则优化[J]. 计算机工程与应用, 2010, 46(35): 136-138. |
| [13] | 王鸿绪. Vague集的综合决策规则在方案优选中的应用[J]. 计算机工程与应用, 2010, 46(27): 145-147. |
| [14] | 陈科1,张保明1,谢明霞1,2. 模糊Bayes 理论在遥感影像变化检测中的应用[J]. 计算机工程与应用, 2010, 46(19): 185-188. |
| [15] | 丁卫平1,2,董建成1,王 斌3,施 佺1,石振国1 . 一种粗糙概念格的电子病历挖掘模型研究与设计[J]. 计算机工程与应用, 2010, 46(18): 215-219. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||