计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (8): 53-58.DOI: 10.3778/j.issn.1002-8331.1806-0091

• 理论与研发 • 上一篇    下一篇

融合多策略特征筛选的跨项目软件缺陷预测

刘树毅,翟  晔,刘东升   

  1. 内蒙古师范大学 计算机与信息工程学院,呼和浩特 010022
  • 出版日期:2019-04-15 发布日期:2019-04-15

Cross-Project Software Defect Prediction Based on Multi-Strategy Feature Filtering

LIU Shuyi, ZHAI Ye, LIU Dongsheng   

  1. College of Computer and Information Engineering, Inner Mongolia Normal University, Hohhot 010022, China
  • Online:2019-04-15 Published:2019-04-15

摘要: 针对跨项目软件缺陷预测过程中,软件缺陷数据存在无关信息或数据冗余等问题,提出融合多策略特征筛选的跨项目软件缺陷预测(cross-project software defect prediction based on Multi-Policy Feature Filtering,MPFF)方法。采用多策略筛选方法与过采样方法进行数据预处理;使用代价敏感的域自适应方法进行分类,分类过程使用少量已标记目标项目数据改善项目间分布差异;在AEEEM、NASA MDP及SOFTLAB数据集上进行了不同度量下预测实验。实验结果表明,在同构度量下MPFF方法相比Burank filter、Peters filter、TCA+和TrAdaBoost方法预测效果最佳。

关键词: 跨项目软件缺陷预测, 无关信息, 数据冗余, 代价敏感, 同构度量

Abstract: For the process of cross-project software defect prediction, software defect data has irrelevant information or data redundancy, cross-project software defect prediction based on Multi-Policy Feature Filtering(MPFF) method is proposed. Firstly, multi-strategy screening method and oversampling method are used for data preprocessing. Then cost-sensitive domain adaptive method is used for classification. The classification process uses a small amount of labeled target project data to improve the distribution difference among projects. Finally, different metric prediction experiments are performed on the AEEEM, NASA MDP, and SOFTLAB data sets. Different metric prediction experiments are performed on the data set. The experimental results show that the MPSDA method has the best performance compared with the Burank filter, Peters filter, TCA+ and TrAdaBoost methods under the homogeneous metric.

Key words: cross-project software defect prediction, irrelevant information;data redundancy, cost sensitive, homogeneous metric