Computer Engineering and Applications ›› 2017, Vol. 53 ›› Issue (21): 42-48.DOI: 10.3778/j.issn.1002-8331.1606-0176

Previous Articles     Next Articles

Software fault localization model based on linear classification algorithm

HE Haijiang   

  1. Department of Mathematics and Computer Science, Changsha University, Changsha 410022, China
  • Online:2017-11-01 Published:2017-11-15

基于线性分类算法的软件错误定位模型

何海江   

  1. 长沙学院 数学与计算机科学系,长沙 410022

Abstract: Spectrum-Based Fault Localization(SBFL) techniques aid developers to reduce the debugging effort. As a light-weight promising approach, SBFL only collects the testing result of passed or failed, and the corresponding coverage information. Based on these data, SBFL can then calculate a runtime spectra for each program statement. SBFL approaches apply various functions to map four profile features to a suspiciousness score. However, existing functions don’t give good accuracy due to the influence of the fixed parameters. Therefore, a machine learning method is proposed that can automatically construct a suspiciousness function of the specific program set. First, the old versions of a program having fault code are collected. Next, it is mapped from the feature difference in a pair of faulty statement and non-faulty statement to an instance in training dataset. Finally the linear classification algorithm of Weka is applied to learn a mapping function. The function learned from old versions is defined as the fault localization model of the program. To assess the validity of the proposed method, an experiment is performed on three benchmark datasets: Siemens suite, space and gzip. Experimental result demonstrates that the proposed method reduces fault localization cost that exists in SBFL approaches.

Key words: classification algorithm, linear model, fault localization, program spectra, software testing

摘要: 基于谱的错误定位(SBFL)方法能帮助程序员减小软件调试的困难。作为一种轻量方法,SBFL只需收集测试用例的覆盖信息和测试结果,计算程序每条语句的运行特征。众多SBFL方法,将四个运行特征组合成不同的可疑度计算公式。然而,这些公式受固定参数的影响,无法适应不同的程序集。因此,提出一种机器学习方法,能自动确定特定程序集的可疑度计算公式。首先,收集已标注错误语句的程序旧版本;再将错误语句与正确语句的运行特征两两相减,构造为训练集的一个样本;最后基于Weka的分类算法,学习到线性函数,作为该程序的错误定位模型。在Siemens程序包、space和gzip三个基准数据集上,使用Logistic、SGD、SMO和LibLinear学习到的模型,性能都要优于SBFL方法。

关键词: 分类算法, 线性模型, 错误定位, 程序谱, 软件测试