Interpretable Association Rule Defect Prediction Model Combining Counterfactuals and Multi-Objective Optimization

doi:10.3778/j.issn.1002-8331.2501-0275

Abstract

Abstract: Software defect prediction is the key to ensure software quality. In order to improve the performance of software defect prediction, researchers have designed a variety of defect prediction models, but most of the models are less transparent in providing prediction results, which makes it difficult for developers to understand the internal logic and decision-making process of the models, and thus leads to the non-interpretability problem of the models. This problem not only limits the credibility of the models, but also hinders their application in practical development. To address this problem, this paper uses multiple association rules to combine into an interpretable multi-objective optimization model, known as MoCFR, which employs a counterfactual interpretation method for feature selection, and determines the importance score of each feature by the feature change rate of the counterfactual sample. Based on this, the model applies multi-objective optimization techniques to construct an association rule classifier, while optimizing three key metrics： classification error, average number of rules, and confidence. Experimental results on the PROMISE dataset show that MoCFR outperforms existing rule-based classification models in terms of classification error and significantly reduces the number of rules compared to similar multi-objective optimization models.

Key words: software defect prediction, association rule mining, multi-objective optimization, feature selection

摘要： 软件缺陷预测是保证软件质量的关键。为了提高软件缺陷预测的性能，研究人员已经设计出多种缺陷预测模型，但大多数模型在提供预测结果时透明度较低，使得开发者难以理解模型内部的逻辑和决策过程，从而导致模型的不可解释性问题。该问题不仅限制了模型的可信度，也阻碍了其在实际发展中的应用。针对该问题，利用多个关联规则组合成一个可解释的多目标优化模型，被称为MoCFR。该模型采用反事实解释方法进行特征选择，通过反事实样本的特征变化率来确定每个特征的重要性分数。在此基础上，该模型运用多目标优化技术构建关联规则分类器，同时优化分类误差、规则平均数量和置信度三个关键指标。在PROMISE数据集上的实验结果表明，MoCFR在分类误差方面优于现有的基于规则的分类模型，与同类多目标优化模型相比，显著减少了规则数量。

关键词: 软件缺陷预测, 关联规则挖掘, 多目标优化, 特征选择

YU Qiao, JIANG Jiaxuan, REN Siyu, ZHU Yi. Interpretable Association Rule Defect Prediction Model Combining Counterfactuals and Multi-Objective Optimization[J]. Computer Engineering and Applications, 2025, 61(22): 339-352.

于巧, 蒋佳漩, 任思宇, 祝义. 融合反事实与多目标优化的可解释关联规则缺陷预测模型[J]. 计算机工程与应用, 2025, 61(22): 339-352.

References

[1] 赵晨阳, 刘磊, 江贺. 基于多目标优化的工作量感知即时软件缺陷预测特征构建方法[J]. 计算机科学, 2025, 52(1): 232-241.
ZHAO C Y, LIU L, JIANG H. Feature construction for effort-aware just-in-time software defect prediction based on multi-objective optimization[J]. Computer Science, 2025, 52(1): 232-241.
[2] WANG Y J, ZHAO X Y, XU T, et al. AutoField: automating feature selection in deep recommender systems[C]//Proceedings of the ACM Web Conference 2022. New York: ACM, 2022: 1977-1986.
[3] AL-HELALI B, CHEN Q, XUE B, et al. Genetic programming for feature selection based on feature removal impact in high-dimensional symbolic regression[J]. IEEE Transactions on Emer-ging Topics in Computational Intelligence, 2024, 8(3): 2269-2282.
[4] 吴建生, 李艳兰, 黄冲, 等. 无监督多视图特征选择研究进展[J]. 软件学报, 2025, 36(2): 886-914.
WU J S, LI Y L, HUANG C, et al. Recent advances in unsupervised multi-view feature selection[J]. Journal of Software, 2025, 36(2): 886-914.
[5] AFZAL W, TORKAR R. Towards benchmarking feature subset selection methods for software fault prediction[M]//Computational intelligence and quantitative software engineering. Cham: Springer, 2016: 33-58.
[6] RODRIGUEZ D, RUIZ R, CUADRADO-GALLEGO J, et al. Attribute selection in software engineering datasets for detec-ting fault modules[C]//Proceedings of the 33rd EUROMICRO Conference on Software Engineering and Advanced Applications. Piscataway: IEEE, 2007: 418-423.
[7] PRENKAJ B, VILLAIZáN-VALLELADO M, LEEMANN T, et al. Unifying evolution, explanation, and discernment: a generative approach for dynamic graph counterfactuals[C]//Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York: ACM, 2024: 2420-2431.
[8] WEI H, WANG S H, HAN X Z, et al. Synthesizing counterfactual samples for effective image-text matching[C]//Proceedings of the 30th ACM International Conference on Multi-media. New York: ACM, 2022: 4355-4364.
[9] 朱霄, 邵心玥, 张岩, 等. 面向数据库配置优化的反事实解释方法[J]. 软件学报, 2024, 35(9): 4469-4492.
ZHU X, SHAO X Y, ZHANG Y, et al. Counterfactual interpretation method for database configuration optimization[J]. Journal of Software, 2024, 35(9): 4469-4492.
[10] MENZIES T, GREENWALD J, FRANK A. Data mining static code attributes to learn defect predictors[J]. IEEE Transactions on Software Engineering, 2007, 33(1): 2-13.
[11] ARAR ? F, AYAN K. A feature dependent Naive Bayes app-roach and its application to the software defect prediction problem[J]. Applied Soft Computing, 2017, 59: 197-209.
[12] HE P, LI B, LIU X, et al. An empirical study on software defect prediction with a simplified metric set[J]. Information and Software Technology, 2015, 59: 170-190.
[13] ELISH K O, ELISH M O. Predicting defect-prone software modules using support vector machines[J]. Journal of Systems and Software, 2008, 81(5): 649-660.
[14] ARAR ? F, AYAN K. Software defect prediction using cost-sensitive neural network[J]. Applied Soft Computing, 2015, 33: 263-277.
[15] ARISHOLM E, BRIAND L C, JOHANNESSEN E B. A systematic and comprehensive investigation of methods to build and evaluate fault prediction models[J]. Journal of Systems and Software, 2010, 83(1): 2-17.
[16] SUN Z B, SONG Q B, ZHU X Y. Using coding-based ensemble learning to improve software defect prediction[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C (App-lications and Reviews), 2012, 42(6): 1806-1817.
[17] BERTSIMAS D, DUNN J. Optimal classification trees[J]. Machine Learning, 2017, 106(7): 1039-1082.
[18] LIU B, HSU W, MA Y. Integrating classification and association rule mining[C]//Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, 1998: 80-86.
[19] LI W M, HAN J W, PEI J. CMAR: accurate and efficient classification based on multiple class-association rules[C]//Proceedings of the 2001 IEEE International Conference on Data Mining. Piscataway: IEEE, 2002: 369-376.
[20] YIN X X, HAN J W. CPAR: classification based on predictive association rules[C]//Proceedings of the 2003 SIAM International Conference on Data Mining, 2003: 331-335.
[21] FRIEDMAN J H, POPESCU B E. Predictive learning via rule ensembles[J]. The Annals of Applied Statistics, 2008, 2(3): 916-954.
[22] DEMBCZY?SKI K, KOT?OWSKI W, S?OWI?SKI R. Max-imum likelihood rule ensembles[C]//Proceedings of the 25th International Conference on Machine Learning, 2008: 224-231.
[23] DEMBCZY?SKI K, KOT?OWSKI W, S?OWI?SKI R. ENDER: a statistical framework for boosting decision rules[J]. Data Mining and Knowledge Discovery, 2010, 21(1): 52-90.
[24] WEI D, DASH S, GAO T, et al. Generalized linear rule models[C]//Proceedings of the 36th International Conference on Machine Learning, 2019: 6687-6696.
[25] MITA G, PAPOTTI P, FILIPPONE M, et al. LIBRE: learning interpretable Boolean rule ensembles[C]//Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020: 245-255.
[26] WACHTER S, MITTELSTADT B, RUSSELL C. Counterfactual explanations without opening the black box: automated decisions and the GDPR[J]. Harvard Journal of Law & Technology, 2017, 31(2): 841-887.
[27] MOTHILAL R K, SHARMA A, TAN C H. Explaining mac-hine learning classifiers through diverse counterfactual explanations[C]//Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. New York: ACM, 2020: 607-617.
[28] WAHONO R S. A systematic literature review of software defect prediction[J]. Journal of Software Engineering, 2015, 1(1): 1-16.
[29] SONG Q B, SHEPPERD M, CARTWRIGHT M, et al. Software defect association mining and defect correction effort prediction[J]. IEEE Transactions on Software Engineering, 2006, 32(2): 69-82.
[30] CHANG C P, CHU C P, YEH Y F. Integrating in-process software defect prediction with association mining to discover defect pattern[J]. Information and Software Technology, 2009, 51(2): 375-384.
[31] MA B J, ZHANG H P, CHEN G Q, et al. Investigating associative classification for software fault prediction: an experimental perspective[J]. International Journal of Software Engineering and Knowledge Engineering, 2014, 24(1): 61-90.
[32] SHAO Y X, LIU B, WANG S H, et al. A novel software defect prediction based on atomic class-association rule mining[J]. Expert Systems with Applications, 2018, 114: 237-254.
[33] MATTIEV J, KAV?EK B. A compact and understandable associative classifier based on overall coverage[J]. Procedia Computer Science, 2020, 170: 1161-1167.
[34] RAJAB K D. New associative classification method based on rule pruning for classification of datasets[J]. IEEE Access, 2019, 7: 157783-157795.
[35] SOOD N, ZAIANE O. Building a competitive associative classifier[C]//Proceedings of the 22nd International Conference on Big Data Analytics and Knowledge Discovery. Cham: Springer, 2020: 223-234.
[36] VENTURINI L, BARALIS E, GARZA P. Scaling associative classification for very large datasets[J]. Journal of Big Data, 2017, 4(1): 44.
[37] GENG L Q, HAMILTON H J. Interestingness measures for data mining: a survey[J]. ACM Computing Surveys, 2006, 38(3): 9.
[38] SHARMA R, KAUSHIK M, PEIOUS S A, et al. Expected vs. unexpected: selecting right measures of interestingness[C]//Proceedings of the 22nd International Conference on Big Data Analytics and Knowledge Discovery. Cham: Springer, 2020: 38-47.
[39] BUI-THI D, MEYSMAN P, LAUKENS K. MoMAC: multi-objective optimization to combine multiple association rules into an interpretable classification[J]. Applied Intelligence, 2022, 52(3): 3090-3102.
[40] SONG K, LEE K. Predictability-based collective class association rule mining[J]. Expert Systems with Applications, 2017, 79: 1-7.
[41] YANG G F, SHIMADA K, MABU S, et al. A nonlinear model to rank association rules based on semantic similarity and genetic network programing[J]. IEEJ Transactions on Electrical and Electronic Engineering, 2009, 4(2): 248-256.
[42] YANG G F, MABU S M, SHIMADA K, et al. Ranking association rules for classification based on genetic network programming[C]//Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation. New York: ACM, 2009: 1917-1918.
[43] ANGELINO E, LARUS-STONE N, ALABI D, et al. Learning certifiably optimal rule lists for categorical data[J]. Journal of Machine Learning Research, 2018, 18: 234.
[44] CHEN C, RUDIN C. An optimization approach to learning falling rule lists[C]//Proceedings of the 21st International Conference on Artificial Intelligence and Statistics, 2018: 604-612.
[45] LETHAM B, RUDIN C, MCCORMICK T H, et al. Interpretable classifiers using rules and Bayesian analysis: building a better stroke prediction model[J]. The Annals of Applied Statistics, 2015, 9(3): 1350-1371.
[46] RIJNBEEK P R, KORS J A. Finding a short and accurate decision rule in disjunctive normal form by exhaustive search[J]. Machine Learning, 2010, 80(1): 33-62.
[47] WANG F, RUDIN C. Falling rule lists[C]//Proceedings of the 18th International Conference on Artificial Intelligence and Statistics, 2015: 1013-1022.
[48] YANG H, RUDIN C, SELTZER M. Scalable Bayesian rule lists[C]//Proceedings of the 34th International Conference on Machine Learning, 2017: 3921-3930.
[49] HUANG L S, CHEN H P, WANG X, et al. A fast algorithm for mining association rules[J]. Journal of Computer Science and Technology, 2000, 15(6): 619-624.
[50] DJENOURI Y, BELHADI A, FOURNIER-VIGER P, et al. Mining diversified association rules in big datasets: a cluster/GPU/genetic approach[J]. Information Sciences, 2018, 459: 117-134.
[51] LUNA J M, FOURNIER-VIGER P, VENTURA S. Frequent itemset mining: a 25 years review[J]. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2019, 9(6): e1329.
[52] DJENOURI Y, LIN J C W, N?RV?G K, et al. Highly efficient pattern mining based on transaction decomposition[C]//Proceedings of the 2019 IEEE 35th International Conference on Data Engineering. Piscataway: IEEE, 2019: 1646-1649.
[53] DEB K, JAIN H. An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part I: solving problems with box constraints[J]. IEEE Transactions on Evolutionary Computation, 2014, 18(4): 577-601.
[54] YANG F Y, ZENG G D, ZHONG F, et al. Interpretable software defect prediction incorporating multiple rules[C]//Proceedings of the 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering. Piscataway: IEEE, 2023: 940-947.