计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (14): 61-68.DOI: 10.3778/j.issn.1002-8331.1811-0103

• 理论与研发 • 上一篇    下一篇

FSDNP:针对软件缺陷数预测的特征选择方法

李叶飞1,2,官国飞2,葛崇慧2,陈  翔3,倪  超1,钱柱中1   

  1. 1.南京大学 计算机科学与技术系,南京 210023
    2.江苏方天电力技术有限公司,南京 210000
    3.南通大学 计算机科学与技术学院,江苏 南通 226019
  • 出版日期:2019-07-15 发布日期:2019-07-11

FSDNP:Feature Selection Method for Software Defect Number Prediction

LI Yefei1,2, GUAN Guofei2, GE Chonghui2, CHEN Xiang3, NI Chao1, QIAN Zhuzhong1   

  1. 1.Department of Computer Science and Technology, Nanjing University, Nanjing 210023, China
    2.Jiangsu Frontier Electric Technology Co., Ltd., Nanjing 210000, China
    3.College of Computer Science & Technology, Nantong University, Nantong, Jiangsu 226019, China
  • Online:2019-07-15 Published:2019-07-11

摘要: 软件缺陷预测先前的研究工作主要关注软件缺陷分类问题,即判断一个软件模块是否含有缺陷。如何量化一个软件模块中含有软件缺陷的数量问题还未被很好地研究。针对该问题,提出了一种两阶段的软件模块缺陷数预测特征选择方法FSDNP:特征聚类阶段和特征选择阶段。在特征聚类阶段中,使用基于密度峰聚类的算法将高度相关的特征进行聚类;在特征选择阶段,设计了三种启发式的排序策略从簇中删除冗余的和无关的特征。在PROMISE数据集上,使用平均错误率和平均相对错误率指标,与6个经典的方法进行了比较。实验结果表明,FSDNP能够有效移除冗余的和无关的特征,构建高效的软件缺陷数预测模型。

关键词: 软件质量保障, 软件缺陷数预测, 特征选择, 实证研究

Abstract: Previous works in software defect prediction mainly focused on classifying software modules as defect-prone or not. How to quantify the number of defects in a software module has rarely been investigated. To address such issue, the paper proposes a two-stage feature selection approach for software defect number prediction FSDNP:feature clustering phase and feature selection phase. The feature clustering phase clusters highly correlated features using a density-based clustering method, and the feature selection phase removes irrelevant and redundant features from each cluster using three heuristic ranking strategies. FSDNP compares with six state-of-the-art baseline approaches using average absolute error and average relative error on PROMISE dataset. The results show FSDNP can effectively remove irrelevant and redundant features and build effective software defect number prediction model.

Key words: software quality assurance, software defect number prediction, feature selection, empirical studies