Computer Engineering and Applications ›› 2012, Vol. 48 ›› Issue (33): 79-84.

Previous Articles     Next Articles

Comparison and analysis on binary file similarity detection technique

CHEN Hui1,2, GUO Tao3, CUI Baojiang2, WANG Jianxin4   

  1. 1.School of Computer Science and Technology, Shandong Yingcai University, Jinan 250104, China
    2.School of Computer Science and Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China
    3.China Information Technology Security Evaluation Center, Beijing 100085, China
    4.School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China
  • Online:2012-11-21 Published:2012-11-20

二进制文件相似性检测技术对比分析

陈  慧1,2,郭  涛3,崔宝江2,王建新4   

  1. 1.山东英才学院 计算机学院,济南 250104
    2.北京邮电大学 计算机学院,北京 100876
    3.中国信息安全测评中心,北京 100085
    4.北京林业大学 信息学院,北京 100083

Abstract: The traditional file similarity detection technique is generally based on source code. In the case of source code unavailable, binary comparison technique is proposed for clone detection. Four binary file similarity detection techniques and the main detection tools are summarized and analyzed. Based on the evaluation method of binary file clone comparison, the experiment test has been carried on. This method provides a review of binary file clone types, detection approaches and the similarity calculation standard. Experiments show that for continuous clone, division clone which doesn’t affect the call relations, equivalent replacement clone which doesn’t affect basic block number and the call relations, the techniques similarity detection with the binary files gets more accurate result than the ones token-based similarity detection with the source code files.

Key words: homologous software, binary file comparison, clone detection

摘要: 传统的文件相似性检测技术是基于源代码的,针对源代码难以获取的情况,二进制文件比对技术被提出并受到越来越多的关注。总结和分析了四种二进制文件相似性检测技术和主流的检测工具。在提出了二进制文件克隆比对的评价方法的基础上进行了实验测试。该方法针对二进制文件克隆的分类方式,设计了实验流程和相似度的计算标准。结果表明对于连续克隆,不影响调用关系的分割克隆,不影响基本块数量和调用关系的等价替换克隆,采用二进制文件相似性检测比采用基于token的源代码文件相似性检测能得到更准确的检测结果。

关键词: 软件同源性, 二进制文件比对, 克隆检测