计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (11): 53-61.DOI: 10.3778/j.issn.1002-8331.1703-0576

• 理论与研发 • 上一篇    下一篇

基于修改日志克隆代码跟踪及演化模式识别

葛广帅,刘东升,张丽萍,侯  敏,包萨仁娜   

  1. 内蒙古师范大学 计算机与信息工程学院,呼和浩特 010022
  • 出版日期:2018-06-01 发布日期:2018-06-14

Clone code tracking and evolution pattern identifying based on modify log

GE Guangshuai, LIU Dongsheng, ZHANG Liping, HOU Min, BAO Sarenna   

  1. College of Computer and Information Engineering, Inner Mongolia Normal University, Hohhot 010022, China
  • Online:2018-06-01 Published:2018-06-14

摘要: 针对当前克隆跟踪大多基于软件的发布版本,丢失了软件开发过程中克隆代码较多的变化信息,并且克隆演化模式定义不明确、不区分视角。提出一种基于修改日志克隆代码跟踪方法,并分三种视角(克隆群、克隆片段、克隆代码内容)识别演化模式。首先,将每次提交作为一个小版本,使用NiCad进行克隆检测;其次,基于Token编辑距离相似度克隆群初步映射;再次,基于修改日志克隆片段精准映射;然后,基于克隆片段映射结果修正克隆群映射;最后,分视角识别克隆演化模式。对6款开源软件总共近8?000个版本进行实验,结果表明超过97%的克隆稳定演化,而分离演化模式、合并演化模式、复杂演化模式均不超过0.01%,一致变化演化模式、不一致变化演化模式均不超过2%。在多款软件上与领域内较优秀的同类工具gCad进行对比实验,结果查全率(提高了2%)、查准率(提高了2%)明显高于gCad,而且同环境下速度比gCad快。

关键词: 克隆代码, 克隆跟踪, 演化模式, 修改日志

Abstract: Recently, most clone tracking is based on release version of software, which loses much change information of clone code during the process of software development. Definition of evolution pattern is unclear, and its perspective is not distinguished. This paper proposes a method of tracing clone code based on modify log, and identifies clone evolution pattern by different perspectives(clone class, clone fragment, clone code content). Firstly, regarding each submission as one small version, and detecting clone of each version by NiCad; Secondly, mapping clone class initially based on levenshtein distance of token; Thirdly, mapping clone fragment based on modify log; Fourthly, revising clone class mapping based on the result of clone fragment mapping; Finally, identifying clone evolution pattern in different perspectives. The experiment is conducted on nearly 8 thousand versions of 6 open-source software. The results show that more than 97% clone code is in “stable” evolution pattern, “separate”, “merge”, “complex” is no more than 0.01%, and “consistent change”, “inconsistent change” is less than 2%. Contrast to the similar tool named gCad, recall (increased by 2%) and precision (increased by 2%) of this method are significantly higher, and running speed is also faster in the same environment.

Key words: clone code, clone tracking, evolution pattern, modify log