计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (16): 101-107.DOI: 10.3778/j.issn.1002-8331.2205-0193

• 模式识别与人工智能 • 上一篇    下一篇

用作打印机源识别的多层次语义交互模型MSINet

邱雅文,邹积鑫,朱子奇   

  1. 1.武汉科技大学 计算机科学与技术学院,武汉 430081
    2.公安部物证鉴定中心,北京 100038
  • 出版日期:2023-08-15 发布日期:2023-08-15

Multi-Level Semantic Interaction Model for Printer Source Identification

QIU Yawen, ZOU Jixin, ZHU Ziqi   

  1. 1.School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430081, China
    2.Institute of Forensic Science, Ministry of Public Security, Beijing 100038, China
  • Online:2023-08-15 Published:2023-08-15

摘要: 打印源识别是文件检验领域中重要的取证技术。针对同类打印机文档中字符结构差异导致的显著性类内差异,提出了一种基于空间图像重组的微尺度特征强化方法。该方法通过重组图片字形结构,弱化因字符差异导致的大尺度结构化特征,进而强化模型对不同类型打印机印刷的判别性特征;更进一步,针对不同字号、字体造成的风格差异,提出了一个基于深度学习的多层次语义交互模型MSINet(multi-level semantic interaction network),通过构建不同层次特征的交互方法,降低打印字符的风格差异所带来的影响。在Printing Technique Dataset数据集上验证了所提方法的有效性,识别准确率达到了99.4%,相比目前主流的文本无关打印机源识别方法,具有更高的识别准确率。

关键词: 文件取证, 打印机源识别, 特征融合, 深度学习, 卷积神经网络

Abstract: Printer source identification is an important forensic technique in the field of document inspection. Due to the structural differences of characters and stylistic differences of fonts in print documents, there are difficulties in extracting print text features and analyzing printer specificity. To reduce the structural variability among characters, a spatial image reorganization method with enhanced fine-grained attention is proposed, which captures and enhances the detailed information of printed text by weakening the structural independence of individual characters and enhancing the attention to the basic structure of characters (e.g., strokes, etc.). A multi-level semantic interaction network(MSINet) based on deep learning is proposed for the style differences caused by rich fonts. By constructing interaction methods with different levels of features, the impact of stylistic differences in printed characters is reduced. The effectiveness of the proposed method is verified on the Printing Technique Dataset, and the recognition accuracy reaches 99.4%, which is higher than the mainstream text-independent printer source identification methods.

Key words: document forensics, printer source identification, feature fusion, deep learning, convolutional neural networks