
计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (7): 81-95.DOI: 10.3778/j.issn.1002-8331.2406-0410
刘梓萱,杜建强,罗计根,黄强,贺佳,李益雯,秦紫瑜
出版日期:2025-04-01
发布日期:2025-04-01
LIU Zixuan, DU Jianqiang, LUO Jigen, HUANG Qiang, HE Jia, LI Yiwen, QIN Ziyu
Online:2025-04-01
Published:2025-04-01
摘要: 特征选择在高维数据预处理中扮演着重要角色,通过从原始特征集中挑选出最有利于模型性能提升的特征,可以有效地降低数据维数,提高模型的准确性和降低过拟合风险。稳定性是特征选择领域中一个不容忽视的关键研究内容,它指的是特征选择方法对训练样本的微小扰动具有一定的鲁棒性。深入剖析了特征选择过程中产生不稳定性的多重成因;系统归纳并对比了多种提升稳定性的方法,详细阐述了各类方法的目标和评估标准及其独特优势和潜在缺陷;详尽介绍了评估特征选择稳定性的指标的性质,并对稳定性指标进行解析和细致分类;探讨了稳定性特征选择领域存在的问题及对未来的展望,以期为后续的研究和实践提供有价值的参考。
刘梓萱, 杜建强, 罗计根, 黄强, 贺佳, 李益雯, 秦紫瑜. 稳定性特征选择研究综述[J]. 计算机工程与应用, 2025, 61(7): 81-95.
LIU Zixuan, DU Jianqiang, LUO Jigen, HUANG Qiang, HE Jia, LI Yiwen, QIN Ziyu. Review of Stability Feature Selection[J]. Computer Engineering and Applications, 2025, 61(7): 81-95.
| [1] KALOUSIS A, PRADOS J, HILARIO M. Stability of feature selection algorithms: a study on high-dimensional spaces[J]. Knowledge and Information Systems, 2007, 12(1): 95-116. [2] KHAIRE U M, DHANALAKSHMI R. Stability of feature selection algorithm: a review[J]. Journal of King Saud University-Computer and Information Sciences, 2022, 34(4): 1060-1073. [3] NOGUEIRA S, BROWN G. Measuring the stability of feature selection[C]//Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases. Cham: Springer, 2016: 442-457. [4] YU J M, YIN J J, HUANG S A, et al. A new image super-resolution reconstruction algorithm based on hybrid diffusion model[C]//Proceedings of the 16th Chinese Conference on Image and Graphics Technologies and Applications. Singapore: Springer, 2021: 173-188. [5] ALELYANI S. Stable bagging feature selection on medical data[J]. Journal of Big Data, 2021, 8(1): 11. [6] PAU S, PERNICIANO A, PES B, et al. An evaluation of feature selection robustness on class noisy data[J]. Information, 2023, 14(8): 438. [7] ORRù P, PES B. Feature selection on imbalanced domains: a stability-based analysis[C]//Proceedings of the International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems. Cham: Springer, 2023: 14-27. [8] HUANG C Y. Feature selection and feature stability measurement method for high-dimensional small sample data based on big data technology[J]. Computational Intelligence and Neuroscience, 2021, 2021(1): 3597051. [9] WALD R, KHOSHGOFTAAR T M, NAPOLITANO A. Stability of filter- and wrapper-based feature subset selection[C]//Proceedings of the 2013 IEEE 25th International Conference on Tools with Artificial Intelligence. Piscataway: IEEE, 2013: 374-380. [10] BOMMERT A, RAHNENFüHRER J. Adjusted measures for feature selection stability for data sets with similar features[C]//Proceedings of the 6th International Conference on Machine Learning, Optimization, and Data Science. Cham: Springer, 2020: 203-214. [11] CHEN X Y, WANG X T, ZHOU J T, et al. Activating more pixels in image super-resolution transformer[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 22367-22377. [12] SHI J P, LI H, LIU T L, et al. Image super-resolution using efficient striped window transformer[J]. arXiv:2301.09869, 2023. [13] MIENYE I D, SUN Y X. A survey of ensemble learning: concepts, algorithms, applications, and prospects[J]. IEEE Access, 2022, 10: 99129-99149. [14] SAEYS Y, ABEEL T, VAN DE PEER Y. Robust feature selection using ensemble feature selection techniques[C]//Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases. Berlin, Heidelberg: Springer, 2008: 313-325. [15] ABEEL T, HELLEPUTTE T, VAN DE PEER Y, et al. Robust biomarker identification for cancer diagnosis with ensemble feature selection methods[J]. Bioinformatics, 2010, 26(3): 392-398. [16] GURU D S, SUHIL M, PAVITHRA S K, et al. Ensemble of feature selection methods for text classification: an analytical study[C]//Proceedings of the 17th International Conference on Intelligent Systems Design and Applications, 2018: 337-349. [17] SALMAN R, ALZAATREH A, SULIEMAN H. The stability of different aggregation techniques in ensemble feature selection[J]. Journal of Big Data, 2022, 9(1): 51. [18] EFFROSYNIDIS D, ARAMPATZIS A. An evaluation of feature selection methods for environmental data[J]. Ecological Informatics, 2021, 61: 101224. [19] WANG H Z, HE C Q, LI Z P. A new ensemble feature selection approach based on genetic algorithm[J]. Soft Computing, 2020, 24(20): 15811-15820. [20] BOLóN-CANEDO V, ALONSO-BETANZOS A. Ensembles for feature selection: a review and future trends[J]. Information Fusion, 2019, 52: 1-12. [21] PES B, DESSì N, ANGIONI M. Exploiting the ensemble paradigm for stable feature selection: a case study on high-dimensional genomic data[J]. Information Fusion, 2017, 35: 132-147. [22] MOON M, NAKAI K T. Stable feature selection based on the ensemble L1-norm support vector machine for biomarker discovery[J]. BMC Genomics, 2016, 17(Suppl 13): 1026. [23] RONDINA J M, HAHN T, DE OLIVEIRA L, et al. SCoRS: a method based on stability for feature selection and mapping in neuroimaging[J]. IEEE Transactions on Medical Imaging, 2014, 33(1): 85-98. [24] MOROVVAT M, OSAREH A. An ensemble of filters and wrappers for microarray data classification[J]. Mach Learn Appl An Int J, 2016, 3(2): 1-17. [25] WANG J, XU J, ZHAO C G, et al. An ensemble feature selection method for high-dimensional data based on sort aggregation[J]. Systems Science & Control Engineering, 2019, 7(2): 32-39. [26] PES B. Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains[J]. Neural Computing and Applications, 2020, 32(10): 5951-5973. [27] FAHAD A, TARI Z, KHALIL I, et al. An optimal and stable feature selection approach for traffic classification based on multi-criterion fusion[J]. Future Generation Computer Systems, 2014, 36: 156-169. [28] XIANG F, ZHAO Y L, ZHANG M, et al. Ensemble learning-based stability improvement method for feature selection towards performance prediction[J]. Journal of Manufacturing Systems, 2024, 74: 55-67. [29] DITTMAN D J, KHOSHGOFTAAR T M, WALD R, et al. Comparing two new gene selection ensemble approaches with the commonly-used approach[C]//Proceedings of the 2012 11th International Conference on Machine Learning and Applications. Piscataway: IEEE, 2012: 184-191. [30] BOUCHEHAM A, BATOUCHE M. Massively parallel feature selection based on ensemble of filters and multiple robust consensus functions for cancer gene identification[C]//Proceedings of the Science and Information Conference. Cham: Springer, 2015: 93-108. [31] LIU Y, DIAO X C, CAO J J, et al. Evolutionary algorithms’ feature selection stability improvement system[C]//Proceedings of the 12th International Conference on Bio-inspired Computing: Theories and Applications. Singapore: Springer, 2017: 68-81. [32] GINSBURG S B, LEE G, ALI S, et al. Feature importance in nonlinear embeddings(FINE): applications in digital pathology[J]. IEEE Transactions on Medical Imaging, 2016, 35(1): 76-88. [33] LAHMIRI S, SHMUEL A. Detection of Parkinson’s disease based on voice patterns ranking and optimized support vector machine[J]. Biomedical Signal Processing and Control, 2019, 49: 427-433. [34] LIU Z, WANG R Y, JAPKOWICZ N, et al. Mobile app traffic flow feature extraction and selection for improving classification robustness[J]. Journal of Network and Computer Applications, 2019, 125: 190-208. [35] LI Y, LI T, LIU H. Recent advances in feature selection and its applications[J]. Knowledge and Information Systems, 2017, 53(3): 551-577. [36] YU L, DING C, LOSCALZO S, et al. Stable feature selection via dense feature groups[C]//Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2008: 803-811. [37] LEE I G, YOON S W, WON D. A mixed integer linear programming support vector machine for cost-effective group feature selection: branch-cut-and-price approach[J]. European Journal of Operational Research, 2022, 299(3): 1055-1068. [38] KAMKAR I, GUPTA S K, PHUNG D, et al. Stable feature selection for clinical prediction: exploiting ICD tree structure using Tree-Lasso[J]. Journal of Biomedical Informatics, 2015, 53: 277-290. [39] FERNANDEZ-LOZANO C, SEOANE J A, GESTAL M, et al. Texture classification using feature selection and kernel-based techniques[J]. Soft Computing, 2015, 19(9): 2469-2480. [40] WU T, HAO Y H, YANG B, et al. ECM-EFS: an ensemble feature selection based on enhanced co-association matrix[J]. Pattern Recognition, 2023, 139: 109449. [41] HU R Y, ZHU X F, CHENG D B, et al. Graph self-representation method for unsupervised feature selection[J]. Neurocomputing, 2017, 220: 130-137. [42] MOHAMMADI M, SHARIFI NOGHABI H, ABED HODTANI G, et al. Robust and stable gene selection via maximum-minimum correntropy criterion[J]. Genomics, 2016, 107(2/3): 83-87. [43] 李云. 稳定的特征选择研究[J]. 微型机与应用, 2012, 31(15): 1-2. LI Y. Research on stable feature selection[J]. Microcomputer & Its Applications, 2012, 31(15): 1-2. [44] YU L, HAN Y, BERENS M E. Stable gene selection from microarray data via sample weighting[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2012, 9(1): 262-272. [45] LI K W, YU M X, LIU L, et al. Feature selection method based on weighted mutual information for imbalanced data[J]. International Journal of Software Engineering and Knowledge Engineering, 2018, 28(8): 1177-1194. [46] LI Y, SI J, ZHOU G J, et al. FREL: a stable feature selection algorithm[J]. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(7): 1388-1402. [47] DONG H B, LI T, DING R, et al. A novel hybrid genetic algorithm with granular information for feature selection and optimization[J]. Applied Soft Computing, 2018, 65: 33-46. [48] KHAIRE U M, DHANALAKSHMI R. Stability investigation of improved whale optimization algorithm in the process of feature selection[J]. IETE Technical Review, 2022, 39(2): 286-300. [49] SAKAE Y, STRAUB J E, OKAMOTO Y. Enhanced sampling method in molecular simulations using genetic algorithm for biomolecular systems[J]. Journal of Computational Chemistry, 2019, 40(2): 475-481. [50] ISACHENKO R V, STRIJOV V V. Quadratic programming optimization with feature selection for nonlinear models[J]. Lobachevskii Journal of Mathematics, 2018, 39(9): 1179-1187. [51] NOGUEIRA S, BROWN G. Measuring the stability of feature selection with applications to ensemble methods[C]//Proceedings of the 12th International Workshop on Multiple Classifier Systems. Cham: Springer, 2015: 135-146. [52] NOGUEIRA S, SECHIDIS K, BROWN G. On the stability of feature selection algorithms[J]. Journal of Machine Learning Research, 2017, 18(1): 6345-6398. [53] KUNCHEVA L. A stability index for feature selection[C]//Proceedings of the IASTED International Conference on Artificial Intelligence and Applications, 2007: 421-427. [54] 李郅琴, 杜建强, 聂斌, 等. 特征选择方法综述[J]. 计算机工程与应用, 2019, 55(24): 10-19. LI Z Q, DU J Q, NIE B, et al. Summary of feature selection methods[J]. Computer Engineering and Applications, 2019, 55(24): 10-19. [55] KHOSHGOFTAAR T M, FAZELPOUR A, WANG H J, et al. A survey of stability analysis of feature subset selection techniques[C]//Proceedings of the 2013 IEEE 14th International Conference on Information Reuse & Integration. Piscataway: IEEE, 2013: 424-431. [56] 刘艺, 曹建军, 刁兴春, 等. 特征选择稳定性研究综述[J]. 软件学报, 2018, 29(9): 2559-2579. LIU Y, CAO J J, DIAO X C, et al. Survey on stability of feature selection[J]. Journal of Software, 2018, 29(9): 2559-2579. [57] ABOU-ABBAS L, HENNI K, JEMAL I, et al. Patient-independent epileptic seizure detection by stable feature selection[J]. Expert Systems with Applications, 2023, 232: 120585. [58] WANG H, KHOSHGOFTAAR T M, WALD R. Measuring robustness of feature selection techniques on software engineering datasets[C]//Proceedings of the 2011 IEEE International Conference on Information Reuse & Integration, 2011: 309-314. [59] BOMMERT A. Integration of feature selection stability in model fitting[Z]. Dortmund: TU Dortmund University, 2020. [60] HOSSEINY MARANI A, BAUMER E P S. A review of stability in topic modeling: metrics for assessing and techniques for improving stability[J]. ACM Computing Surveys, 2023, 56(5): 1-32. [61] MOHANA CHELVAN P, PERUMAL K. A comparative analysis of feature selection stability measures[C]//Proceedings of the 2017 International Conference on Trends in Electronics and Informatics. Piscataway: IEEE, 2017: 124-128. [62] 江川. 面向高维小样本数据的特征选择稳定性研究[D]. 宜昌: 三峡大学, 2021. JIANG C. Study on stability of feature selection for high-dimensional small sample data[D]. Yichang: China Three Gorges University, 2021. [63] STUDENT S, FUJAREWICZ K. Stable feature selection and classification algorithms for multiclass microarray data [J]. Biology Direct, 2012, 7: 1-20. [64] DERNONCOURT D, HANCZAR B, ZUCKER J D. Analysis of feature selection stability on high dimension and small sample data[J]. Computational Statistics & Data Analysis, 2014, 71: 681-693. [65] RAKESH D K, ANWIT R, JANA P K. A new ranking-based stability measure for feature selection algorithms[J]. Soft Computing, 2023, 27(9): 5377-5396. [66] LUSTGARTEN J L, GOPALAKRISHNAN V, VISWESW- ARAN S. Measuring stability of feature selection in biomedical datasets[C]//AMIA Annual Symposium Proceedings, 2009: 406-410. [67] AWADA W, KHOSHGOFTAAR T M, DITTMAN D, et al. A review of the stability of feature selection techniques for bioinformatics data[C]//Proceedings of the 2012 IEEE 13th International Conference on Information Reuse & Integration. Piscataway: IEEE, 2012: 356-363. [68] ZHANG M, ZHANG L, ZOU J F, et al. Evaluating reproducibility of differential expression discoveries in microarray studies by considering correlated molecular changes[J]. Bioinformatics, 2009, 25(13): 1662-1668. [69] SECHIDIS K, PAPANGELOU K, NOGUEIRA S, et al. On the stability of feature selection in the presence of feature correlations[C]//Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases. Cham: Springer, 2020: 327-342. [70] ALAIZ-RODRíGUEZ R, PARNELL A C. An information theoretic approach to quantify the stability of feature selection and ranking algorithms[J]. Knowledge-Based Systems, 2020, 195: 105745. [71] CHELVAN P M, PERUMAL K. A study on selection stability measures for various feature selection algorithms[C]//Proceedings of the 2016 IEEE International Conference on Computational Intelligence and Computing Research. Piscataway: IEEE, 2016: 1-4. [72] HAMER V, DUPONT P. An importance weighted feature selection stability measure[J]. Journal of Machine Learning Research, 2021, 22(1): 5153-5209. [73] BARBIERI M C, GRISCI B I, DORN M. Analysis and comparison of feature selection methods towards performance and stability[J]. Expert Systems with Applications, 2024, 249: 123667. [74] CHEN P T, LI F, WU C W. Research on intrusion detection method based on Pearson correlation coefficient feature selection algorithm[J]. Journal of Physics: Conference Series, 2021, 1757(1): 012054. |
| [1] | 刘佳璇, 李代伟, 任李娟, 张海清, 陈金京, 杨瑞. 面向不平衡医疗数据的多阶段混合特征选择算法[J]. 计算机工程与应用, 2025, 61(2): 158-169. |
| [2] | 庄俊玺, 王琪, 赖英旭, 刘静, 靳晓宁. 基于三元深度融合的行为驱动成绩预警模型[J]. 计算机工程与应用, 2024, 60(9): 346-356. |
| [3] | 刘明, 杜建强, 李郅琴, 罗计根, 聂斌, 张梦婷. 融合Lasso的近似马尔科夫毯特征选择方法[J]. 计算机工程与应用, 2024, 60(8): 121-130. |
| [4] | 许华杰, 刘冠霆, 张品, 秦远卓. 采用动态相关度权重的特征选择算法[J]. 计算机工程与应用, 2024, 60(4): 89-98. |
| [5] | 李道全, 祝圣凯, 翟豫阳, 胡一帆. 基于特征选择与改进的Tri-training的半监督网络流量分类[J]. 计算机工程与应用, 2024, 60(23): 275-285. |
| [6] | 李梦晴, 孙林, 徐久成. 自适应图嵌入和非凸正则特征自表达的无监督特征选择[J]. 计算机工程与应用, 2024, 60(16): 177-185. |
| [7] | 孙林, 梁娜, 王欣雅. 基于自适应邻域与聚类的非平衡数据特征选择[J]. 计算机工程与应用, 2024, 60(14): 74-85. |
| [8] | 于涛, 高岳林. 融入小生境和混合变异策略的鲸鱼优化算法[J]. 计算机工程与应用, 2024, 60(10): 88-104. |
| [9] | 张姁, 杨学志, 刘雪南, 方帅. 视频脉搏特征的非接触房颤检测[J]. 计算机工程与应用, 2023, 59(8): 331-340. |
| [10] | 金枝, 张倩, 李熙莹. 基于轻量化ConvLSTM的密集道路车辆检测算法[J]. 计算机工程与应用, 2023, 59(8): 89-96. |
| [11] | 张梦婷, 杜建强, 罗计根, 聂斌, 熊旺平, 刘明, 赵书含. 多目标优化特征选择研究综述[J]. 计算机工程与应用, 2023, 59(3): 23-32. |
| [12] | 潘建文, 张志华, 林高毅, 崔展齐. 基于特征选择的恶意Android应用检测方法[J]. 计算机工程与应用, 2023, 59(21): 287-295. |
| [13] | 罗计根, 熊玲珠, 杜建强, 聂斌, 熊旺平, 李郅琴. 一种融合近似马尔科夫毯的随机森林优化算法[J]. 计算机工程与应用, 2023, 59(20): 77-84. |
| [14] | 王若凡, 王浩东, 石连栓. 复杂网络与GA-PSO算法下的癫痫脑电识别[J]. 计算机工程与应用, 2023, 59(20): 119-128. |
| [15] | 项剑文, 陈泯融, 杨百冰. 结合Swin及多尺度特征融合的细粒度图像分类[J]. 计算机工程与应用, 2023, 59(20): 147-157. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||