基于句子相似度的论文抄袭检测模型研究

计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (24): 199-201.

• 图形、图像、模式识别 • 上一篇下一篇

基于句子相似度的论文抄袭检测模型研究

冷强奎1，秦玉平1，王春立2

1.渤海大学信息科学与工程学院，辽宁锦州 121000
2.大连海事大学信息科学技术学院，辽宁大连 116026

收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-08-21 发布日期:2011-08-21

Study on model for plagiarism-detection of scientific papers based on sentence similarity

LENG Qiangkui1，QIN Yuping1，WANG Chunli2

1.College of Information Science and Engineering，Bohai University，Jinzhou，Liaoning 121000，China
2.College of Information Science and Technology，Dalian Maritime University，Dalian，Liaoning 116026，China

Received:1900-01-01 Revised:1900-01-01 Online:2011-08-21 Published:2011-08-21

摘要/Abstract

摘要： 提出一种基于句子相似度的论文抄袭检测模型。利用局部词频指纹算法对大规模文档进行快速检测，找出疑似抄袭文档。根据最长有序公共子序列算法计算句子间的相似度，并标注抄袭细节，给出抄袭依据。在标准中文数据集SOGOU-T上进行的实验表明，该模型具有较强的局部信息挖掘能力，在一定程度上克服了现有的论文抄袭检测算法精度不高的缺点。

关键词: 句子相似度, 抄袭检测, 局部词频, 最长有序公共子序列

Abstract: A new model for plagiarism-identification of scientific papers based on sentence similarity is presented.Large-scale texts are quickly detected with Local Word-Frequency Fingerprint（LWFF） to find suspected plagiarism ones.Sentence similarity is computed according to the Longest Sorted Common Subsequence（LSCS） between source texts and destination texts.The algorithm can mark plagiarism details，and show evidence.The identification experiments on the SOGOU-T database are done with this model.The results show it has higher information mining capacity，and partly overcomes the shortage of lower precision on existing plagiarism-identification of scientific papers.

Key words: sentence similarity, plagiarism-detection, local word-frequency, Longest Sorted Common Subsequence（LSCS）

冷强奎1，秦玉平1，王春立2. 基于句子相似度的论文抄袭检测模型研究[J]. 计算机工程与应用, 2011, 47(24): 199-201.

LENG Qiangkui1，QIN Yuping1，WANG Chunli2. Study on model for plagiarism-detection of scientific papers based on sentence similarity[J]. Computer Engineering and Applications, 2011, 47(24): 199-201.

[1]	杨延娇，赵国涛，王丕栋. 基于语义与情感的句子相似度计算方法[J]. 计算机工程与应用, 2021, 57(16): 151-158.
[2]	纪明宇，王晨龙，安翔，牟伟晔. 面向智能客服的句子相似度计算方法[J]. 计算机工程与应用, 2019, 55(13): 123-128.
[3]	杨超. 基于多种技术的混合式程序代码抄袭检测方法[J]. 计算机工程与应用, 2016, 52(18): 222-227.
[4]	王丽月，叶东毅. 面向游戏客服场景的自动问答系统研究与实现[J]. 计算机工程与应用, 2016, 52(17): 152-159.
[5]	秦玉平1，唐亚伟2，伦淑娴3，王秀坤4. 一种基于二叉树的数学公式抄袭检测算法[J]. 计算机工程与应用, 2015, 51(1): 257-260.
[6]	吴佐衍，王宇. 基于HNC理论和依存句法的句子相似度计算[J]. 计算机工程与应用, 2014, 50(3): 97-102.
[7]	殷耀明，张东站. 基于关系向量模型的句子相似度计算[J]. 计算机工程与应用, 2014, 50(2): 198-203.
[8]	钟美，张丽萍，刘东升. 基于XML的C代码抄袭检测算法[J]. 计算机工程与应用, 2011, 47(8): 215-218.
[9]	张培颖. 多特征融合的语句相似度计算模型[J]. 计算机工程与应用, 2010, 46(26): 136-137.
[10]	李林，周一民. 传递信息分类的句子间相似性度量[J]. 计算机工程与应用, 2009, 45(31): 15-17.
[11]	田生伟¹，吐尔根·依布拉音¹，禹龙²，买合木提·木合买提¹，艾山·吾买尔¹. 一种维吾尔语句子相似度算法的研究[J]. 计算机工程与应用, 2009, 45(26): 144-146.
[12]	周法国,杨炳儒. 句子相似度计算新方法及在问答系统中的应用[J]. 计算机工程与应用, 2008, 44(1): 165-167.

基于句子相似度的论文抄袭检测模型研究

Study on model for plagiarism-detection of scientific papers based on sentence similarity

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 12

编辑推荐

Metrics