用作打印机源识别的多层次语义交互模型MSINet

doi:10.3778/j.issn.1002-8331.2205-0193

摘要/Abstract

摘要： 打印源识别是文件检验领域中重要的取证技术。针对同类打印机文档中字符结构差异导致的显著性类内差异，提出了一种基于空间图像重组的微尺度特征强化方法。该方法通过重组图片字形结构，弱化因字符差异导致的大尺度结构化特征，进而强化模型对不同类型打印机印刷的判别性特征；更进一步，针对不同字号、字体造成的风格差异，提出了一个基于深度学习的多层次语义交互模型MSINet（multi-level semantic interaction network），通过构建不同层次特征的交互方法，降低打印字符的风格差异所带来的影响。在Printing Technique Dataset数据集上验证了所提方法的有效性，识别准确率达到了99.4%，相比目前主流的文本无关打印机源识别方法，具有更高的识别准确率。

关键词: 文件取证, 打印机源识别, 特征融合, 深度学习, 卷积神经网络

Abstract: Printer source identification is an important forensic technique in the field of document inspection. Due to the structural differences of characters and stylistic differences of fonts in print documents, there are difficulties in extracting print text features and analyzing printer specificity. To reduce the structural variability among characters, a spatial image reorganization method with enhanced fine-grained attention is proposed, which captures and enhances the detailed information of printed text by weakening the structural independence of individual characters and enhancing the attention to the basic structure of characters （e.g., strokes, etc.）. A multi-level semantic interaction network（MSINet） based on deep learning is proposed for the style differences caused by rich fonts. By constructing interaction methods with different levels of features, the impact of stylistic differences in printed characters is reduced. The effectiveness of the proposed method is verified on the Printing Technique Dataset, and the recognition accuracy reaches 99.4%, which is higher than the mainstream text-independent printer source identification methods.

Key words: document forensics, printer source identification, feature fusion, deep learning, convolutional neural networks

邱雅文, 邹积鑫, 朱子奇. 用作打印机源识别的多层次语义交互模型MSINet[J]. 计算机工程与应用, 2023, 59(16): 101-107.

QIU Yawen, ZOU Jixin, ZHU Ziqi. Multi-Level Semantic Interaction Model for Printer Source Identification[J]. Computer Engineering and Applications, 2023, 59(16): 101-107.

参考文献

[1] TSAI M J，TAO Y H，YUADI I.Deep learning for printed document source identification[J].Signal Processing：Image Communication，2019，70：184-198.
[2] JOSHI S，KHANNA N.Source printer classification using printer specific local texture descryptor[J].IEEE Transactions on Information Forensics and Security，2019，15：160-171.
[3] FERREIRA A，BONDI L，BAROFFIO L，et al.Data-driven feature characterization techniques for laser printer attribution[J].IEEE Transactions on Information Forensics and Security，2017，12（8）：1860-1873.
[4] CHIANG P J，KHANNA N，MIKKILINENI A K，et al.Printer and scanner forensics[J].IEEE Signal Processing Magazine，2009，26（2）：72-83.
[5] GEBHARDT J，GOLDSTEIN M，SHAFAIT F，et al.Document authentication using printing technique features and unsupervised anomaly detection[C]//2013 12th International Conference on Document Analysis and Recognition，2013：479-483.
[6] TSAI M J，LIU J.Digital forensics for printed source identification[C]//2013 IEEE International Symposium on Circuits and Systems，2013：2347-2350.
[7] TSAI M J，YIN J S，YUADI I，et al.Digital forensics of printed source identification for Chinese characters[J].Multimedia Tools and Applications，2014，73（3）：2129-2155.
[8] FERREIRA A，NAVARRO L C，PINHEIRO G，et al.Laser printer attribution：exploring new features and beyond[J].Forensic Science International，2015，247：105-125.
[9] OJALA T，PIETIKAINEN M，MAENPAA T.Multiresolution gray-scale and rotation invariant texture classification with local binary patterns[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2002，24（7）：971-987.
[10] EID A H，AHMED M N，RIPPETOE E E.EP printer jitter characterization using 2D Gabor filter and spectral analysis[C]//2008 15th IEEE International Conference on Image Processing，2008：1860-1863.
[11] MIKKILINENI A K，CHIANG P J，ALI G N，et al.Printer identification based on texture features[C]//International Conference on Digital Printing Technologies，2004：306-311.
[12] MIKKILINENI A K，ARSLAN O，CHIANG P J，et al.Printer forensics using svm techniques[C]//International Conference on Digital Printing Technologies，2005：223-226.
[13] MIKKILINENI A K，KHANNA N，DELP E J.Forensic printer detection using intrinsic signatures[C]//Proceedings of SPIE-The International Society for Optical Engineering，2011，7880：278-288.
[14] BULAN O，MAO J，SHARMA G.Geometric distortion signatures for printer identification[C]//2009 IEEE International Conference on Acoustics，Speech and Signal Processing，2009：1401-1404.
[15] WU Y，KONG X，GUO Y.Printer forensics based on page document’s geometric distortion[C]//2009 16th IEEE International Conference on Image Processing，2009：2909-2912.
[16] TSAI M J，YUADI I，TAO Y H.Decision-theoretic model to identify printed sources[J].Multimedia Tools and Applications，2018，77（20）：27543-27587.
[17] JOSHI S，LOMBA M，GOYAL V，et al.Augmented data and improved noise residual-based CNN for printer source identification[C]//2018 IEEE International Conference on Acoustics，Speech and Signal Processing，2018：2002-2006.
[18] TSAI M J，HSU C L，YIN J S，et al.Japanese character based printed source identification[C]//2015 IEEE International Symposium on Circuits and Systems，2015：2800-2803.
[19] ELKASRAWI S，SHAFAIT F.Printer identi-fication using supervised learning for document forgery detection[C]//2014 11th IAPR International Workshop on Document Analysis Systems，2014：146-150.
[20] BIBI M，HAMID A，MOETESUM M，et al.Document forgery detection using printer source identification—a text-independent approach[C]//2019 International Conference on Document Analysis and Recognition Workshops，2019，8：7-12.
[21] GUPTA S，KUMAR M.Forensic document examination system using boosting and bagging methodologies[J].Soft Computing，2020，24（7）：5409-5426.