计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (16): 256-261.DOI: 10.3778/j.issn.1002-8331.2204-0442

• 网络、通信与安全 • 上一篇    下一篇

改进的有监督跨域协议缺陷预测算法

周超,王震,秦富童,刘义   

  1. 中国人民解放军63891部队
  • 出版日期:2023-08-15 发布日期:2023-08-15

Enhanced Supervised Cross-Domain Protocol Defect Prediction Algorithm

ZHOU Chao, WANG Zhen, QIN Futong, LIU Yi   

  1. Unit 63891 of PLA, China
  • Online:2023-08-15 Published:2023-08-15

摘要: 针对软件代码的缺陷预测是常见的研究问题,但基于协议的代码缺陷预测暂时无人尝试研究。提出了改进的有监督跨域协议缺陷预测(enhanced supervised cross-domain protocol defect prediction,ESCPDP)算法,解决跨域缺陷预测中类不平衡及特征冗余等问题。首先提出Mean-ReSMOTE算法来解决数据集的类不平衡问题,其次提出Hybrid-RFE+算法对过采样后的数据进行特征选择,得到最优子集,最后使用支持向量机(support vector machine,SVM)构建有监督缺陷预测模型。在NASA数据集和自主搜集构建的Net协议缺陷数据集上,以Acc、Recall和F1值作为评测指标对提出的模型进行验证,实验结果表明改进的有监督跨域协议缺陷预测算法要优于其他经典算法,具有更好的预测效果。

关键词: 缺陷预测, 类不平衡, 过采样, 特征选择, 有监督学习

Abstract: Defect prediction for software code is a common research problem, but protocol-based code defect prediction is an unknown problem for the time being. In this paper, an enhanced supervised cross-domain protocol defect prediction(ESCPDP) algorithm is proposed to solve class imbalance and feature redundancy problems in the cross-domain defect prediction. Firstly, mean-RESMOTE is proposed to solve the problem of class imbalance in the dataset. Secondly, Hybrid-RFE+ is proposed to solve the problem of feature selection carried out on the over-sampled data for getting the optimal subset. Finally, support vector machine(SVM) is used to build a supervised defect prediction model. Acc, Recall and F1 values are used as evaluation indexes to verify the proposed model on the NASA dataset and the Net protocol defect dataset independently collected and constructed. Experimental results show that ESCPDP algorithm is superior to other classical algorithms and has better prediction effect.

Key words: defect prediction, class imbalance, over-sampled, feature selection, supervised learning