Software defect prediction using semi-supervised support vector machine with sampling

doi:10.3778/j.issn.1002-8331.1601-0447

Abstract

Abstract: Software defect prediction is helpful to improve the quality of software and effectively allocate test resources. To tackle two practical yet important issues in software defect prediction: labeled data is hard to be collected and class imbalance, a sample based semi-supervised support vector machine method is proposed. This method uses an unsupervised sample approach to sample a small percentage of modules to be tested and labeled, and this sample method can ensure that the defect instances in training sets are not too few. Semi-supervised support vector machine algorithm uses few labeled data combined with unlabeled to build predictor so that the model can exploit the information of unlabeled data. In the evaluation on four NASA projects, the experimental results show that the proposed approach achieves comparable performance compared with supervised learning models, but uses little defect information. Moreover, proposed method’s performance is better than other semi-supervised learning methods in terms of recall and F-measure.

Key words: software defect prediction, semi-supervised, Safe Semi-Supervised Support Vector Machines（S4VM）, class imbalance, sample

摘要： 软件缺陷预测有助于提高软件开发质量，保证测试资源有效分配。针对软件缺陷预测研究中类标签数据难以获取和类不平衡分布问题，提出基于采样的半监督支持向量机预测模型。该模型采用无监督的采样技术，确保带标签样本数据中缺陷样本数量不会过低，使用半监督支持向量机方法，在少量带标签样本数据基础上利用无标签数据信息构建预测模型；使用公开的NASA软件缺陷预测数据集进行仿真实验。实验结果表明提出的方法与现有半监督方法相比，在综合评价指标[F]值和召回率上均优于现有方法；与有监督方法相比，能在学习样本较少的情况下取得相当的预测性能。

关键词: 软件缺陷预测, 半监督, Safe半监督支持向量机（S4VM）, 类不平衡, 采样

LIAO Shengping, XU Ling, YAN Meng. Software defect prediction using semi-supervised support vector machine with sampling[J]. Computer Engineering and Applications, 2017, 53(14): 161-166.

廖胜平，徐玲，鄢萌. 基于采样的半监督支持向量机软件缺陷预测方法[J]. 计算机工程与应用, 2017, 53(14): 161-166.

[1]	XU Chengzhi, WAN Fang. Application of Siamese Network with Two-Level Neighborhood Sampling in Manifold Learning [J]. Computer Engineering and Applications, 2021, 57(9): 233-239.
[2]	ZOU Chengming, HU Youpu. Monocular Depth Estimation in Outdoor Scene with Generative Adversarial Network [J]. Computer Engineering and Applications, 2021, 57(6): 176-183.
[3]	YANG Fengyu, HUANG Yaxuan, ZHOU Shijian, ZHENG Wei. Survey of Software Defect Prediction Combined with Multi-metrics [J]. Computer Engineering and Applications, 2021, 57(5): 10-24.
[4]	LUO Huilan, PENG Shan, CHEN Hongkun. Review on Latest Research Progress of Challenging Problems in Object Detection [J]. Computer Engineering and Applications, 2021, 57(5): 36-46.
[5]	XIAO Zhenjiu, KONG Xiangxu, ZONG Jiaxu, YANG Yueying. Image Object Detection Algorithm Based on Adaptive Focal Loss [J]. Computer Engineering and Applications, 2021, 57(23): 185-192.
[6]	YI Lingzhi, WANG Shitong, YI Fang, DENG Dong, YI Zhimin, JIANG Peng. Wind Farm Ultra-Short-Term Wind Speed Prediction Based on EEMDSE-ILSTM [J]. Computer Engineering and Applications, 2021, 57(22): 288-294.
[7]	ZHOU Shaoguang, WU Hao, ZHAO Chanjuan, CHEN Renxi. Transfer Learning for Hyperspectral Image Classification Using Homogeneous Area Characteristics [J]. Computer Engineering and Applications, 2021, 57(21): 224-233.
[8]	FENG Lei, JIANG Lei, XU Hua, GOU Zezhong. Triplet Siamese Network Modulation Recognition Algorithm Based on Network Measurement [J]. Computer Engineering and Applications, 2021, 57(19): 135-141.
[9]	MI Yuan, TANG Hengliang. Rumor Identification Research Based on Graph Convolutional Network [J]. Computer Engineering and Applications, 2021, 57(13): 161-167.
[10]	ZHAO Manyu, YE Jun. Quasi-synchronization Control for Heterogeneous Networks with Time Delays [J]. Computer Engineering and Applications, 2021, 57(12): 86-92.
[11]	TANG Huanling, LIU Yanhong, ZHENG Han, DOU Quansheng, LU Mingyu. Imbalanced Text Categorization Method with SLDA Topic Model [J]. Computer Engineering and Applications, 2021, 57(12): 144-154.
[12]	WANG Yu, LIU Fan, WANG Fei. Autoencoder Based Sparse Representation for Single Sample Face Recognition [J]. Computer Engineering and Applications, 2021, 57(1): 168-172.
[13]	DAN Yufang, TAO Jianwen, XU Haote. Semi-Supervised Classification Method of Possibilistic Clustering Assumption [J]. Computer Engineering and Applications, 2020, 56(9): 65-74.
[14]	SONG Lili, LI Bin, ZHAO Junya, LIU Guofeng. Normality Resampling of Improved Metric Learning Method for Person Re-Identification [J]. Computer Engineering and Applications, 2020, 56(8): 158-165.
[15]	HAN Song, HAN Qiuhong. Review of Semi-Supervised Learning Research [J]. Computer Engineering and Applications, 2020, 56(6): 19-27.

Software defect prediction using semi-supervised support vector machine with sampling

基于采样的半监督支持向量机软件缺陷预测方法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics