Bagging异构集成的代码异味检测与重构优先级划分

doi:10.3778/j.issn.1002-8331.2305-0218

摘要/Abstract

摘要： 代码异味是不良的设计和代码实现的症状，可能阻碍代码理解、增加代码更改和出错的可能性。以前的研究专注于单一模型在代码异味上的检测，并且无法为开发人员提供重构建议。针对上述问题，提出一种基于Bagging异构集成模型的代码异味检测与重构优先级划分方法，该方法利用分类器间的异质性，通过F1集成策略来检测Complex Class、Long Method、Spaghetti Code等三种代码异味，并将模型输出的异味概率转化为可能性分布后，为开发人员提供重构意见。实验在6个开源系统的32个版本上验证、评估：（1）基分类器的稳定性以及与代码异味的关系；（2）Bagging异构集成模型检测上述代码异味的性能；（3）将异味概率转化为可能性分布并作为重构优先级的有效性。实验结果表明，最佳基分类器因代码异味类型而异。同时，Bagging异构集成模型相较于基分类器，F1提高0.06~40.51个百分点，AUC提高0.45~28.37个百分点。最后将Bagging异构集成模型的重构优先级与6名受访者的重构优先级进行Kappa一致性检验，两者具有高度一致性。

关键词: 代码异味, 机器学习, 集成学习, 软件重构, 可能性分布

Abstract: Code smells are symptoms of poor design and implementation choices that may hinder code comprehension and possibly increase change-and fault-proneness. Previous research has focused on the detection of code smells by a single model and failed to provide developers with refactoring recommendations. Aiming at the above problems, a code smell detection and refactoring priority division method based on Bagging heterogeneous ensemble model is proposed. The method exploits the heterogeneity between classifiers to detect three code smells such as Complex Class, Long Method, and Spaghetti Code through the F1 integration strategy, and converts the smell probability by the model into a possibility distribution to provide developers with refactoring recommendations. The experimental results verify and evaluate：(1) the stability and relationship among base classifiers in code smells; (2) the performance of Bagging heterogeneous ensemble model to detect the code smells mentioned above; (3) the validity of the refactoring priority when transforming smell probability into possibility distribution, on 32 versions of 6 open-source systems. The experimental results show that the best base classifier varies with code smell types. At the same time, compared with the base classifiers, Bagging heterogeneous ensemble model has a 0.06~40.51 percentage points increase in F1 and a 0.45~28.37 percentage points increase in AUC. Finally, Cohen’s Kappa test is conducted between refactoring priorities of Bagging heterogeneous ensemble model and six respondents, which are highly consistent.

Key words: code smell, machine learning, ensemble learning, software refactoring, possibility distribution

吴海涛, 蔡咏琦, 高建华. Bagging异构集成的代码异味检测与重构优先级划分[J]. 计算机工程与应用, 2024, 60(3): 138-147.

WU Haitao, CAI Yongqi, GAO Jianhua. Bagging Heterogeneous Ensemble Code Smell Detection and Refactoring Priority Division[J]. Computer Engineering and Applications, 2024, 60(3): 138-147.

参考文献

[1] FOWLER M. Refactoring: improving the design of existing code[M]. 2nd ed. New York: Addison-Wesley Professional, 2018: 75-87.
[2] TUFANO M, PALOMBA F, BAVOTA G, et al. When and why your code starts to smell bad[C]//Proceedings of the IEEE/ACM 37th International Conference on Software Engineering, Florence, May 16-24, 2015. New York: IEEE Press, 2015: 403-414.
[3] CANFORA G, CERULO L, PENTA M D. On the user of line co-change for identifying crosscutting concern code[C]//Proceedings of the 22nd IEEE International Conference on Software Maintenance, Philadelphia, Sept 24-27, 2006. New York: IEEE Press, 2006: 403-414.
[4] MOHA N, GUEHENEUC Y G, DUCHIEN L, et al. DECOR: a method for the specification and detection of code and design smells[J]. IEEE Transaction on Software Engineering, 2010, 36(1): 20-36.
[5] FOKAEFS M, TSANTALIS N, STROULIA E, et al. JDeodorant: identification and application of extract class refactorings[C]//Proceedings of the 33rd International Conference on Software Engineering, Honolulu, May 21-28, 2011. New York: ACM Press, 2011: 1037-1039.
[6] JAIN S, SAHA A. Improving performance with hybrid feature selection and ensemble machine learning techniques for code smell detection[J]. Science of Computer Programming, 2021, 212: 102713.
[7] PECORELLI F, PALOMBA F, NUCCI D D, et al. Comparing heuristic and machine learning approaches for metric-based code smell detection[C]//Proceedings of the 27th IEEE/ACM International Conference on Program Comprehension, Montreal, May 25-26, 2019. New York: IEEE Press, 2019: 93-104.
[8] PALOMBA F, BAVOTA G, PENTA M D, et al. Do they really smell bad? A study on developers’perception of bad code smells[C]//Proceedings of the 2014 IEEE International Conference on Software Maintenance and Evolution, Victoria, Sept 29-Oct 3, 2014. Washington DC: IEEE Computer Society, 2014: 101-110.
[9] FONTANA F A, MIKA V M, ZANONI M, et al. Comparing and experimenting machine learning techniques for code smell detection[J]. Empirical Software Engineering, 2015, 21(3): 1143-1191.
[10] AZEEM M I, PALOMBA F, LIN S, et al. Machine learning techniques for code smell detection: a systematic literature review and meta-analysis[J]. Information and Software Technology, 2019, 108: 115-138.
[11] BOUTAIB S, ELARBI M, BECHIKH S, et al. Handling uncertainty in SBSE: a possibilistic evolutionary approach for code smells detection[J]. Empirical Software Engineering, 2022, 27(6): 1-78.
[12] ALAZBA A, ALJAMAAN H. Code smell detection using feature selection and stacking ensemble: an empirical investigation[J]. Information and Software Technology, 2021, 138: 106648.
[13] DUBOIS D D, HENRI P. Unfair coins and necessity measures: towards a possibilistic interpretation of histograms[J]. Fuzzy Sets Systems, 1983, 10(1/3): 15-20.
[14] 田迎晨, 李柯君, 王太明, 等. 代码坏味研究综述[J]. 软件学报, 2023, 34(1): 150-170.
TIAN Y C, LI K J, WANG T M, et al. Survey on code smells[J]. Journal of Software, 2023, 34(1): 150-170.
[15] MCCABE T J. A complexity measure[J]. IEEE Transactions on Software Engineering, 1976, 2(4): 308-320.
[16] CHIDAMBER S R, KEMERER C F. A metrics suite for object oriented design[J]. IEEE Transactions on Software Engineering, 1994, 20(6): 476-493.
[17] GUPTA A, CHAUHAN N K. A severity-based classification assessment of code smells in kotlin and Java application[J]. Arabian Journal for Science and Engineering, 2022, 47(2): 1831-1848.
[18] GUGGULOTHU T, MOIZ S A. Code smell detection using multi-label classification approach[J]. Software Quality Journal, 2020, 28(3): 1063-1086.
[19] DAS A K, YADAV S, DHAL S. Detecting code smells using deep learning[C]//Proceedings of the TENCON IEEE Region 10 Conference, Kochi, Oct 17-20, 2019. New York: IEEE Press, 2019: 2081-2086.
[20] ANICHE M. Java code metrics calculator (CK)[EB/OL]. (2015-10-05)[2022-07-09]. https://github.com/mauricioaniche/ck.
[21] BOUTAIB S, ELARBI M, BECHIKH S, et al. Software anti-patterns detection under uncertainty using a possibilistic evolutionary approach[C]//Proceedings of the 24th European Conference on Genetic Programming (Part of EvoStar), Seville, Apr 7-9, 2021. Berlin: Springer, 2021: 181-197.
[22] BREIMAN L. Bagging predictors[J]. Machine Learning, 1996, 24(2): 123-140.
[23] BELL R M, OSTRAND T J, WEYUKER E J. The limited impact of individual developer data on software defect prediction[J]. Empirical Software Engineering, 2013, 18(3): 478-505.
[24] KOHAVI R. A study of cross-validation and bootstrap for accuracy estimation and model selection[C]//Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal, Aug 20-25, 1995. San Francisco: Morgan Kaufmann, 1995: 1137-1145.
[25] PALOMBA F, BAVOTA G, PENTA M D, et al. On the diffuseness and the impact on maintainability of code smells: a large scale empirical investigation[J]. Empirical Software Engineering, 2018, 23(3): 1188-1221.
[26] PECORELLI F, NUCCI D D, ROOVER C D, et al. On the role of data balancing for machine learning-based code smell detection[C]//Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation, Tallinn, Aug 27, 2019. New York: ACM Press, 2019: 19-24.
[27] GENUER R, POGGI G M, TULEAU-MALOT C, et al. Random forests for big data[J]. Big Data Research, 2017, 9: 28-46.
[28] COHEN J. A coefficient of agreement for nominal scales[J]. Educational and Psychological Measurement, 1960, 20(1): 37-46.
[29] FLEISS J L, COHEN J. The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability[J]. Educational and Psychological Measurement, 1973, 33(3): 613-619.
[30] BARUCH Y. Response rate in academic studies—a comparative analysis[J]. Human Relations, 1999, 52(4): 421-438.
[31] PALOMBA F, BAVOTA G, PENTA M D, et al. Mining version histories for detecting code smells[J]. IEEE Transactions on Software Engineering, 2015, 41(5): 462-489.