基于采样的半监督支持向量机软件缺陷预测方法

doi:10.3778/j.issn.1002-8331.1601-0447

计算机工程与应用 ›› 2017, Vol. 53 ›› Issue (14): 161-166.DOI: 10.3778/j.issn.1002-8331.1601-0447

基于采样的半监督支持向量机软件缺陷预测方法

廖胜平，徐玲，鄢萌

重庆大学软件学院，重庆 401331

出版日期:2017-07-15 发布日期:2017-08-01

Software defect prediction using semi-supervised support vector machine with sampling

LIAO Shengping, XU Ling, YAN Meng

School of Software Engineering, Chongqing University, Chongqing 401331, China

Online:2017-07-15 Published:2017-08-01

摘要/Abstract

摘要： 软件缺陷预测有助于提高软件开发质量，保证测试资源有效分配。针对软件缺陷预测研究中类标签数据难以获取和类不平衡分布问题，提出基于采样的半监督支持向量机预测模型。该模型采用无监督的采样技术，确保带标签样本数据中缺陷样本数量不会过低，使用半监督支持向量机方法，在少量带标签样本数据基础上利用无标签数据信息构建预测模型；使用公开的NASA软件缺陷预测数据集进行仿真实验。实验结果表明提出的方法与现有半监督方法相比，在综合评价指标[F]值和召回率上均优于现有方法；与有监督方法相比，能在学习样本较少的情况下取得相当的预测性能。

关键词: 软件缺陷预测, 半监督, Safe半监督支持向量机（S4VM）, 类不平衡, 采样

Abstract: Software defect prediction is helpful to improve the quality of software and effectively allocate test resources. To tackle two practical yet important issues in software defect prediction: labeled data is hard to be collected and class imbalance, a sample based semi-supervised support vector machine method is proposed. This method uses an unsupervised sample approach to sample a small percentage of modules to be tested and labeled, and this sample method can ensure that the defect instances in training sets are not too few. Semi-supervised support vector machine algorithm uses few labeled data combined with unlabeled to build predictor so that the model can exploit the information of unlabeled data. In the evaluation on four NASA projects, the experimental results show that the proposed approach achieves comparable performance compared with supervised learning models, but uses little defect information. Moreover, proposed method’s performance is better than other semi-supervised learning methods in terms of recall and F-measure.

Key words: software defect prediction, semi-supervised, Safe Semi-Supervised Support Vector Machines（S4VM）, class imbalance, sample

廖胜平，徐玲，鄢萌. 基于采样的半监督支持向量机软件缺陷预测方法[J]. 计算机工程与应用, 2017, 53(14): 161-166.

LIAO Shengping, XU Ling, YAN Meng. Software defect prediction using semi-supervised support vector machine with sampling[J]. Computer Engineering and Applications, 2017, 53(14): 161-166.

[1]	邹承明，胡佑璞. 引入生成对抗网络的室外场景单目深度估计[J]. 计算机工程与应用, 2021, 57(6): 176-183.
[2]	杨丰玉，黄雅璇，周世健，郑巍. 结合多元度量指标软件缺陷预测研究进展[J]. 计算机工程与应用, 2021, 57(5): 10-24.
[3]	陈俊丰，郑中团. WKMeans与SMOTE结合的不平衡数据过采样方法[J]. 计算机工程与应用, 2021, 57(23): 106-112.
[4]	王乐，韩萌，李小娟，张妮，程浩东. 不平衡数据集分类方法综述[J]. 计算机工程与应用, 2021, 57(22): 42-52.
[5]	周绍光，吴昊，赵婵娟，陈仁喜. 利用同质区特性的高光谱图像迁移学习分类[J]. 计算机工程与应用, 2021, 57(21): 224-233.
[6]	涂睿，王文格，卢成阳. 移动机器人实时采样路径重规划[J]. 计算机工程与应用, 2021, 57(20): 157-163.
[7]	孟东霞，李玉鑑. 利用自然最近邻的不平衡数据过采样方法[J]. 计算机工程与应用, 2021, 57(2): 91-96.
[8]	米源，唐恒亮. 基于图卷积网络的谣言鉴别研究[J]. 计算机工程与应用, 2021, 57(13): 161-167.
[9]	赵曼宇，叶军. 一类时滞异质网络的拟同步控制[J]. 计算机工程与应用, 2021, 57(12): 86-92.
[10]	唐焕玲，刘艳红，郑涵，窦全胜，鲁明羽. 融合SLDA主题模型的不均衡文本分类方法[J]. 计算机工程与应用, 2021, 57(12): 144-154.
[11]	刘云，钱美伊，李辉，王传旭. 特征融合与训练加速的高效目标跟踪[J]. 计算机工程与应用, 2021, 57(10): 101-109.
[12]	但雨芳，陶剑文，徐浩特. 可能性聚类假设的半监督分类方法[J]. 计算机工程与应用, 2020, 56(9): 65-74.
[13]	温廷新，孔祥博. 不平衡样本下的金融市场极端风险预警研究[J]. 计算机工程与应用, 2020, 56(8): 256-260.
[14]	宋丽丽，李彬，赵俊雅，刘国峰. 正态重采样的改进行人再识别度量学习算法[J]. 计算机工程与应用, 2020, 56(8): 158-165.
[15]	韩嵩，韩秋弘. 半监督学习研究的述评[J]. 计算机工程与应用, 2020, 56(6): 19-27.

基于采样的半监督支持向量机软件缺陷预测方法

Software defect prediction using semi-supervised support vector machine with sampling

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics