基于线性分类算法的软件错误定位模型

doi:10.3778/j.issn.1002-8331.1606-0176

计算机工程与应用 ›› 2017, Vol. 53 ›› Issue (21): 42-48.DOI: 10.3778/j.issn.1002-8331.1606-0176

基于线性分类算法的软件错误定位模型

何海江

长沙学院数学与计算机科学系，长沙 410022

出版日期:2017-11-01 发布日期:2017-11-15

Software fault localization model based on linear classification algorithm

HE Haijiang

Department of Mathematics and Computer Science, Changsha University, Changsha 410022, China

Online:2017-11-01 Published:2017-11-15

摘要/Abstract

摘要： 基于谱的错误定位（SBFL）方法能帮助程序员减小软件调试的困难。作为一种轻量方法，SBFL只需收集测试用例的覆盖信息和测试结果，计算程序每条语句的运行特征。众多SBFL方法，将四个运行特征组合成不同的可疑度计算公式。然而，这些公式受固定参数的影响，无法适应不同的程序集。因此，提出一种机器学习方法，能自动确定特定程序集的可疑度计算公式。首先，收集已标注错误语句的程序旧版本；再将错误语句与正确语句的运行特征两两相减，构造为训练集的一个样本；最后基于Weka的分类算法，学习到线性函数，作为该程序的错误定位模型。在Siemens程序包、space和gzip三个基准数据集上，使用Logistic、SGD、SMO和LibLinear学习到的模型，性能都要优于SBFL方法。

关键词: 分类算法, 线性模型, 错误定位, 程序谱, 软件测试

Abstract: Spectrum-Based Fault Localization（SBFL） techniques aid developers to reduce the debugging effort. As a light-weight promising approach, SBFL only collects the testing result of passed or failed, and the corresponding coverage information. Based on these data, SBFL can then calculate a runtime spectra for each program statement. SBFL approaches apply various functions to map four profile features to a suspiciousness score. However, existing functions don’t give good accuracy due to the influence of the fixed parameters. Therefore, a machine learning method is proposed that can automatically construct a suspiciousness function of the specific program set. First, the old versions of a program having fault code are collected. Next, it is mapped from the feature difference in a pair of faulty statement and non-faulty statement to an instance in training dataset. Finally the linear classification algorithm of Weka is applied to learn a mapping function. The function learned from old versions is defined as the fault localization model of the program. To assess the validity of the proposed method, an experiment is performed on three benchmark datasets: Siemens suite, space and gzip. Experimental result demonstrates that the proposed method reduces fault localization cost that exists in SBFL approaches.

Key words: classification algorithm, linear model, fault localization, program spectra, software testing

何海江. 基于线性分类算法的软件错误定位模型[J]. 计算机工程与应用, 2017, 53(21): 42-48.

HE Haijiang. Software fault localization model based on linear classification algorithm[J]. Computer Engineering and Applications, 2017, 53(21): 42-48.

[1]	谭莉娟，郑巍，刘友林，樊鑫，杨丰玉. 面向适航标准的机载软件测试验证方法综述[J]. 计算机工程与应用, 2021, 57(15): 9-22.
[2]	苏庆，林华智，黄剑锋，林志毅. 结合CNN和Catboost算法的恶意安卓应用检测模型[J]. 计算机工程与应用, 2021, 57(15): 140-146.
[3]	王俊红，郭亚慧. 面向动态数据块的非平衡数据流分类算法[J]. 计算机工程与应用, 2021, 57(13): 124-129.
[4]	刘友林，郑巍，谭莉娟，樊鑫，杨丰玉. 面向适航标准的机载软件测试验证工具综述[J]. 计算机工程与应用, 2021, 57(11): 1-10.
[5]	王彩文，杨有龙. 针对不平衡数据的改进的近邻分类算法[J]. 计算机工程与应用, 2020, 56(7): 30-38.
[6]	黄晴雁，牟永敏，崔展齐，张志华. 基于遗传算法的函数级别软件错误定位[J]. 计算机工程与应用, 2020, 56(22): 66-73.
[7]	杨帆1，谢红薇1，刘爱媛2. 基于卷积神经网络的肺结节分类算法[J]. 计算机工程与应用, 2019, 55(7): 145-150.
[8]	姚毅文，姜淑娟，薄莉莉. 基于变异测试的错误定位研究进展[J]. 计算机工程与应用, 2019, 55(20): 1-12.
[9]	蔡鹏飞，叶剑锋. 结合改进CNN和双线性模型的CBIR方法[J]. 计算机工程与应用, 2019, 55(16): 191-196.
[10]	王正杰，杨伟丽，王喆，侯玉珊，郭银景. 基于CSI的行为识别研究综述[J]. 计算机工程与应用, 2018, 54(5): 14-23.
[11]	陈建峡1，朱季骐1，张月1，张晓星2，吕俊涛3，白德盟3. 基于Spark的输变电线路实时故障监测研究[J]. 计算机工程与应用, 2018, 54(5): 265-270.
[12]	王莹. 具有不确定需求的软件测试用例生成方法研究[J]. 计算机工程与应用, 2018, 54(20): 35-41.
[13]	孙钊，许增朴，王永强，周聪玲. 机器视觉测量中透视投影误差分析控制与补偿[J]. 计算机工程与应用, 2018, 54(2): 266-270.
[14]	薛猛，姜淑娟，王荣存. 基于智能优化算法的测试数据生成综述[J]. 计算机工程与应用, 2018, 54(17): 16-23.
[15]	刘贺飞1，陈小红2，阮彤1. 基于纵向-生存联合模型的游戏行会生存预测[J]. 计算机工程与应用, 2018, 54(14): 264-270.

基于线性分类算法的软件错误定位模型

Software fault localization model based on linear classification algorithm

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics