计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (18): 222-227.

• 工程与应用 • 上一篇    下一篇

基于多种技术的混合式程序代码抄袭检测方法

杨  超   

  1. 合肥学院 基础教学与实验中心,合肥 230601
  • 出版日期:2016-09-15 发布日期:2016-09-14

Hybrid plagiarism detection method in program code based on multiple techniques

YANG Chao   

  1. Basic Teaching and Experimental Department, Hefei University, Hefei 230601, China
  • Online:2016-09-15 Published:2016-09-14

摘要: 在分析现有程序代码抄袭检测系统的特点及局限性的基础上,提出一种综合文本分析、结构度量和属性计数技术的混合式程序抄袭检测方法。应用文档指纹技术和Winnowing算法计算程序的文本相似度;将程序代码表示成动态控制结构树(Dynamic Control Structure tree,DCS),运用Winnowing算法计算DCS树相似度,从而得到结构相似度;收集并统计程序中的每个变量信息,应用变量相似度算法分析变量信息节点获取变量相似度;分别赋予文本相似度、结构相似度和变量相似度一个权值,计算得到总体的代码相似度。实验结果表明,所提出的方法能够有效检测出各种抄袭行为。针对不同的抄袭门槛值,使用该方法的检测结果准确度和查全率高于JPLAG系统。特别对于结构简单的程序组,此方法和JPLAG系统检测结果的平均准确度分别为82.5%和69.5%,说明所提的方法更加有效。

关键词: 抄袭检测, 相似度, Winnowing算法, 结构度量, 属性计数

Abstract: Based on analyzing characteristics and drawbacks of the existing plagiarism detection system in program code, a hybrid plagiarism detection method combining text analysis, structure metrics and attribute counting is proposed. Firstly, the document fingerprinting technology and Winnowing algorithm are used to compute text similarity. Secondly, the program code is translated to a Dynamic Control Structure tree(DCS), and Winnowing algorithm is applied to estimate the DCS tree similarity which is structural similarity also. Then each variable information in code is collected and counted. The variable similarity algorithm is applied to analyze variable information node and get variable similarity. Finally, the text similarity, structural similarity and variable similarity are assigned a weight to compute the total code similarity. The experimental results show that the proposed method can effectively detect all kinds of plagiarism. To the different threshold values, the accuracy and the recall ratio of test results are higher than JPLAG system. Especially for the simple structure in program code, the average accuracy of testing results of the method and JPLAG system are 82.5% and 69.5% respectively. Consequently it shows that the proposed method is more effective.

Key words: plagiarism detection, similarity, Winnowing algorithm, structure metrics, attribute counting