Computer Engineering and Applications ›› 2015, Vol. 51 ›› Issue (1): 257-260.

Previous Articles     Next Articles

Mathematical formula plagiarism detection algorithm based on binary tree

QIN Yuping1, TANG Yawei2, LUN Shuxian3, WANG Xiukun4   

  1. 1.College of Engineering, Bohai University, Jinzhou, Liaoning 121000,China
    2.College of Information Science and Technology, Bohai University, Jinzhou, Liaoning 121000, China
    3.New Energy College, Bohai University, Jinzhou, Liaoning 121000, China
    4.School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
  • Online:2015-01-01 Published:2015-01-06

一种基于二叉树的数学公式抄袭检测算法

秦玉平1,唐亚伟2,伦淑娴3,王秀坤4   

  1. 1.渤海大学 工学院,辽宁 锦州 121000
    2.渤海大学 信息科学与技术学院,辽宁 锦州 121000
    3.渤海大学 新能源学院,辽宁 锦州 121000
    4.大连理工大学 计算机科学与技术学院,辽宁 大连 116024

Abstract: A mathematical formula plagiarism detection algorithm based on binary tree is proposed. Firstly, the paper gets the mathematical formula from the detected document, generates the binary tree of the mathematical formula by its LaTeX form, normalizes the binary tree structure to get the structure code, and then searches the table that named the structure code. If the tale exists, then indexing the record that equals to the formula element of root node and the normalizing variable names pre-order traversing formula element sequence of the binary tree. The indexing resuls confirms if the mathematical formula belong to plagiarism. The experimental results show that the algorithm realize the accurately plagiarism-detection of mathematical formula, so it is a more practical algorithm.

Key words: mathematical formula, plagiarism detection, binary tree, normalization, structure code

摘要: 提出了一种基于二叉树的LaTeX格式数学公式抄袭检测算法。在待检测文档中提取数学公式,根据数学公式的LaTeX格式生成其二叉树表示,对树形结构作归一化处理得到结构码;在公式检测库中查找文件名为该结构码的数据表,若该数据表存在,则在数据表中查找与二叉树根结点公式元素和变量名归一化的先序遍历序列都相同的记录;根据查找结果确定数学公式是否为抄袭。实验结果表明,该算法准确地实现了数学公式的抄袭检测,是一种较实用的算法。

关键词: 数学公式, 抄袭检测, 二叉树, 归一化, 结构码