
Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (23): 268-277.DOI: 10.3778/j.issn.1002-8331.2207-0442
• Engineering and Applications • Previous Articles Next Articles
REN Yanping, ZHENG Zhong, JIANG Yifei, YAN Yuanting, ZHANG Yanping
Online:2022-12-01
Published:2022-12-01
任艳平,郑 重,江一飞,严远亭,张燕平
REN Yanping, ZHENG Zhong, JIANG Yifei, YAN Yuanting, ZHANG Yanping. Posterior Probability and Density-Based Imbalanced Data Undersampling[J]. Computer Engineering and Applications, 2022, 58(23): 268-277.
任艳平, 郑 重, 江一飞, 严远亭, 张燕平. 融合后验概率和密度的不平衡数据欠采样方法[J]. 计算机工程与应用, 2022, 58(23): 268-277.
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2207-0442
| [1] ANAND A,PUGALENTHI G,FOGEL G B,et al.An approach for classification of highly imbalanced data using weighting and undersampling[J].Amino Acids,2010,39(5):1385-1391. [2] JURGOVSKY J,GRANITZER M,ZIEGLER K,et al.Sequence classification for credit-card fraud detection[J].Expert Systems with Applications,2018,100:234-245. [3] HORTA R A M,DE LIMA B S L P,BORGES C C H.A semi-deterministic ensemble strategy for imbalanced datasets(SDEID) applied to bankruptcy prediction[J].WIT Transactions on Information and Communication Technologies,2008,40:205-213. [4] SUN A,LIM E P,LIU Y.On strategies for imbalanced text classification using SVM:a comparative study[J].Decision Support Systems,2009,48(1):191-201. [5] KUBAT M,HOLTE R C,MATWIN S.Machine learning for the detection of oil spills in satellite radar images[J].Machine Learning,1998,30(2):195-215. [6] 严远亭,戴涛,张以文,等.邻域感知的不平衡数据集过采样方法[J].小型微型计算机系统,2021,42(7):1360-1370. YAN Y T,DAI T,ZHANG Y W,et al.Neighborhood-aware imbalanced oversampling[J].Journal of Chinese Computer Systems,2021,42(7):1360-1370. [7] 董明刚,刘明,敬超.利用采样安全系数的多类不平衡过采样算法[J].计算机科学与探索,2020,14(10):1776-1786. DONG M G,LIU M,JING C.Sampling safety coefficient for multi-class imbalance oversampling algorithm[J].Journal of Frontiers of Computer Science and Technology,2020,14(10):1776-1786. [8] 严远亭,朱原玮,吴增宝,等.构造性覆盖算法的SMOTE过采样方法[J].计算机科学与探索,2020,14(6):975-984. YAN Y T,ZHU Y W,WU Z B,et al.Constructive covering algorithm-based SMOTE over-sampling method[J].Journal of Frontiers of Computer Science and Technology,2020,14(6):975-984. [9] ZHOU Z H,LIU X Y.Training cost-sensitive neural networks with methods addressing the class imbalance problem[J].IEEE Transactions on Knowledge and Data Engineering,2005,18(1):63-77. [10] SUN Y,KAMEL M S,WONG A K C,et al.Cost-sensitive boosting for classification of imbalanced data[J].Pattern Recognition,2007,40(12):3358-3378. [11] SEIFFERT C,KHOSHGOFTAAR T M,VAN HULSE J,et al.RUSBoost:a hybrid approach to alleviating class imbalance[J].IEEE Transactions on Systems,Man,and Cybernetics-Part A:Systems and Humans,2009,40(1):185-197. [12] BARANDELA R,SANCHEZ J S,VALDOVINOS R M.New applications of ensembles of classifiers[J].Pattern Analysis & Applications,2003,6(3):245-256. [13] LIU X Y,WU J,ZHOU Z H.Exploratory undersampling for class-imbalance learning[J].IEEE Transactions on Systems Man & Cybernetics Part B,2009,39(2):539-550. [14] CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16(1):321-357. [15] HAN H,WANG W Y,MAO B H.Borderline-SMOTE:a new over-sampling method in imbalanced data sets learning[C]//International Conference on Intelligent Computing.Berlin,Heidelberg:Springer,2005:878-887. [16] HE H,BAI Y,GARCIA E A,et al.ADASYN:adaptive synthetic sampling approach for imbalanced learning[C]//2008 IEEE International Joint Conference on Neural Networks(IEEE World Congress on Computational Intelligence),2008:1322-1328. [17] YAN Y,JIANG Y,ZHENG Z,et al.LDAS:local density-based adaptive sampling for imbalanced data classification[J].Expert Systems with Applications,2022,191:116213. [18] YAN Y,ZHU Y,LIU R,et al.Spatial distribution-based imbalanced undersampling[J].IEEE Transactions on Know-ledge and Data Engineering,2022,doi:10.1109/TKDE. 2022.3161537. [19] HART P.The condensed nearest neighbor rule(corresp.)[J].IEEE Transactions on Information Theory,1968,14(3):515-516. [20] LIN W C,TSAI C F,HU Y H,et al.Clustering-based undersampling in class-imbalanced data[J].Information Sciences,2017,409:17-26. [21] FREUND Y,SCHAPIRE R E.A decision-theoretic gene-ralization of on-line learning and an application to boosting[J].Journal of Computer and System Sciences,1997,55(1):119-139. [22] KOZIARSKI M.Radial-based undersampling for imba-lanced data classification[J].Pattern Recognition,2020,102:107262. [23] SMITH M R,MARTINEZ T,GIRAUD-CARRIER C.An instance level analysis of data complexity[J].Machine Learning,2014,95(2):225-256. [24] LEE H K,KIM S B.An overlap-sensitive margin classifier for imbalanced and overlapping data[J].Expert Systems with Applications,2018,98:72-83. [25] VUTTIPITTAYAMONGKOL P,ELYAN E,PETROVSKI A,et al.Overlap-based undersampling for improving imbalanced data classification[C]//International Conference on Intelligent Data Engineering and Automated Learning.Cham:Springer,2018:689-697. [26] DAS S,DATTA S,CHAUDHURI B B.Handling data irregularities in classification:foundations,trends,and future challenges[J].Pattern Recognition,2018,81:674-693. [27] STEFANOWSKI J.Overlapping,rare examples and class decomposition in learning classifiers from imbalanced data[M]//Emerging paradigms in machine learning.Berlin,Heidelberg:Springer,2013:277-306. [28] BUNKHUMPORNPAT C,SINAPIROMSARAN K,LURSINSAP C.Safe-level-smote:safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem[C]//Pacific-Asia Conference on Knowledge Discovery and Data Mining.Berlin,Heidelberg:Springer,2009:475-482. [29] LIANG X W,JIANG A P,LI T,et al.LR-SMOTE—an improved unbalanced data set oversampling based on K-means and SVM[J].Knowledge-Based Systems,2020,196:105845. [30] WANG Z,WANG H.Global data distribution weighted synthetic oversampling technique for imbalanced learning[J].IEEE Access,2021,9:44770-44783. [31] SáEZ J A,LUENGO J,STEFANOWSKI J,et al.SMOTE-IPF:addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering[J].Information Sciences,2015,291:184-203. [32] BATISTA G E A P A,PRATI R C,MONARD M C.A study of the behavior of several methods for balancing machine learning training data[J].ACM SIGKDD Explorations Newsletter,2004,6(1):20-29. [33] WILSON D L.Asymptotic properties of nearest neighbor rules using edited data[J].IEEE Transactions on Systems,Man,and Cybernetics,1972(3):408-421. [34] IVAN T.Two modifications of CNN[J].IEEE Transactions on Systems,Man and Communications,1976,6:769-772. [35] MANI I,ZHANG I.kNN approach to unbalanced data distributions:a case study involving information extraction[C]//Proceedings of Workshop on Learning from Imbalanced Datasets,2003:1-7. [36] GALAR M.A review on ensembles for the class imba-lance problem:bagging-,boosting-,and hybrid-based approaches[J].IEEE Transactions on Systems Man & Cybernetics Part C Applications & Reviews,2012,42(4):463-484. [37] DAL POZZOLO A,CAELEN O,BONTEMPI G.When is undersampling effective in unbalanced classification tasks?[C]//Joint European Conference on Machine Learning and Knowledge Discovery in Databases.Cham:Springer,2015:200-215. [38] MAYABADI S,SAADATFAR H.Two density-based sampling approaches for imbalanced and overlapping data[J].Knowledge-Based Systems,2022,241:108217. [39] YUAN B W,LUO X G,ZHANG Z L,et al.A novel density-based adaptive k nearest neighbor method for dealing with overlapping problem in imbalanced datasets[J].Neural Computing and Applications,2021,33(9):4457-4481. [40] 周志华.机器学习[M].北京:清华大学出版社,2016. ZHOU Z H.Machine learning[M].Beijing:Tsinghua University Press,2016. [41] FU G H,WU Y J,ZONG M J,et al.Feature selection and classification by minimizing overlap degree for class-imbalanced data in metabolomics[J].Chemometrics and Intelligent Laboratory Systems,2020,196:103906. [42] BRADLEY P.The use of the area under the ROC curve in the evaluation of machine learning algorithms[J].Pattern Recognition,1997,30(7):1145-1159. |
| [1] | QU Haicheng, ZHANG Xuecong, WANG Yuping. CNN Pruning Method Based on Information Fusion Strategy [J]. Computer Engineering and Applications, 2022, 58(24): 125-133. |
| [2] | LU Miaofang, YANG Youlong. Oversampling Algorithm Based on Density Peak Clustering and Radial Basis Function [J]. Computer Engineering and Applications, 2022, 58(21): 67-74. |
| [3] | CUI Xin, XU Hua, ZHU Liang. Multi-classification Ensemble Algorithm for Imbalanced Data [J]. Computer Engineering and Applications, 2022, 58(2): 176-183. |
| [4] | WU Zhengjiang, YANG Tian, ZHENG Ailing, MEI Qiuyu, ZHANG Yaning. Study on Set-Valued Data Balancing Method by Semi-Monolayer Covering Rough Set [J]. Computer Engineering and Applications, 2022, 58(19): 166-173. |
| [5] | LI Xu, CHEN Jiadui, WU Yongming, ZONG Wenze. Classification Strategy of Imbalanced Data in Manufacturing Process Based on Improved SMOTE [J]. Computer Engineering and Applications, 2022, 58(16): 284-291. |
| [6] | WANG Peng, YE Xueyi, WANG Tao, QIAN Dingwei. Face Recognition Based on Double Variation and Double Space Local Directional Pattern [J]. Computer Engineering and Applications, 2021, 57(4): 91-99. |
| [7] | CHEN Junfeng, ZHENG Zhongtuan. Over-Sampling Method on Imbalanced Data Based on WKMeans and SMOTE [J]. Computer Engineering and Applications, 2021, 57(23): 106-112. |
| [8] | JIANG Kui, QIU Yuandong, ZHENG Haocheng. ICMPv6 DDoS Attack Detection Method Based on Information Entropy and LSTM [J]. Computer Engineering and Applications, 2021, 57(21): 148-154. |
| [9] | MENG Dongxia,LI Yujian. Oversampling Method for Unbalanced Data by Natural Nearest Neighbor [J]. Computer Engineering and Applications, 2021, 57(2): 91-96. |
| [10] | SONG Shijie, CHEN Kaiyan, ZHANG Yang. Security Evaluation Framework of Deep Learning Side Channel Analysis from Information Entropy [J]. Computer Engineering and Applications, 2021, 57(17): 138-146. |
| [11] | ZHANG Nianpeng, WU Xu, ZHU Qiang. Entropy-Based Oversampling Framework [J]. Computer Engineering and Applications, 2021, 57(13): 96-101. |
| [12] | WANG Junhong, GUO Yahui. Imbalanced Data Stream Classification Algorithm for Dynamic Data Chunk [J]. Computer Engineering and Applications, 2021, 57(13): 124-129. |
| [13] | WANG Caiwen, YANG Youlong. Improved Nearest Neighbor Classification Algorithm for Imbalanced Data [J]. Computer Engineering and Applications, 2020, 56(7): 30-38. |
| [14] | CHEN Jiancu, WANG Yue, ZHU Xiaofei, LI Zhangyu, LIN Zhihang. Wild Animal Video Object Detection Method Combining Multi-feature Map [J]. Computer Engineering and Applications, 2020, 56(7): 221-227. |
| [15] | LIN Kezheng, ZHANG Yuanming, LI Haotian. Research on HOG Feature Extraction Algorithm Weighted by Information Entropy [J]. Computer Engineering and Applications, 2020, 56(6): 147-152. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||