Computer Engineering and Applications ›› 2025, Vol. 61 ›› Issue (14): 362-376.DOI: 10.3778/j.issn.1002-8331.2403-0221

• Engineering and Applications • Previous Articles    

Empirical Study on Impact of Time-Series Factor on Performance of Just-in-Time Software Defect Prediction

ZHANG Yu, YU Qiao, ZHU Yi, JIANG Shujuan, ZHANG Shutao   

  1. 1.School of Computer Science and Technology, Jiangsu Normal University, Xuzhou, Jiangsu 221116, China 
    2.School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, Jiangsu 221116, China
    3.Jiangsu Xukuang Energy Co., Ltd., Xuzhou, Jiangsu 220009, China
  • Online:2025-07-15 Published:2025-07-15

时序因素对即时软件缺陷预测性能影响的实证研究

张雨,于巧,祝义,姜淑娟,张淑涛   

  1. 1.江苏师范大学 计算机科学与技术学院,江苏 徐州 221116 
    2.中国矿业大学 计算机科学与技术学院,江苏 徐州 221116
    3.江苏徐矿能源股份有限公司,江苏 徐州 220009

Abstract: Just-in-time software defect prediction aims to predict whether code changes submitted by developer contain defects. In recent years, due to its fine-grained, immediacy, and easy traceability, just-in-time software defect prediction has become a hot research topic in the field of defect prediction. Code change commits have time-series characteristic. However, most of the existing researches ignore the impact of time-series factor on the performance of just-in-time software defect prediction. Therefore, exploring the impact of code change submission time on the performance of just-in-time software defect prediction is of great significance. This paper explores the impact of time-series factor on the performance of within-project and cross-project just-in-time software defect prediction. The paper conducts empirical studies on nine datasets with three models, Random Forest, CNN, and XGBoost for just-in-time software defect prediction. The results indicate that, in within-project defect prediction, the closer the time between the training and the testing set, the better the model performance. Compared with non-time-series scenario, the performance gap between cross-project and within-project defect prediction in the time-series scenario is smaller. Therefore, the impact of time-series factors should be fully considered in the research of just-in-time software defect prediction, and priority should be given to datasets closer in time to the testing set when selecting training sets.

Key words: just-in-time software defect prediction (JIT-SDP), time-series, cross-project defect prediction

摘要: 即时软件缺陷预测是针对开发者提交的代码变更是否存在缺陷进行预测。近年来,由于其细粒度、即时性、易追溯的特点,即时软件缺陷预测成为了缺陷预测领域的研究热点。代码变更提交具有时间特性,然而,现有研究大多忽略了时序因素对即时软件缺陷预测的影响。因此,探究代码变更提交时间对即时软件缺陷预测性能的影响规律具有重要意义。探究了时序因素对项目内和跨项目即时软件缺陷预测性能的影响,采用随机森林、CNN和XGBoost三种模型在9个即时软件缺陷预测数据集上展开了实证研究。研究结果表明:在项目内缺陷预测中,训练集与测试集时间越接近,模型性能越好;与非时序场景相比,时序场景下的跨项目缺陷预测与项目内缺陷预测的性能差距更小。因此,在即时软件缺陷预测研究中应该充分考虑时序因素的影响,在进行训练集的选择时应优先考虑与测试集时间相距较近的数据集。

关键词: 即时软件缺陷预测(JIT-SDP), 时序因素, 跨项目缺陷预测