计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (22): 210-218.DOI: 10.3778/j.issn.1002-8331.2104-0312

• 图形图像处理 • 上一篇    下一篇

DPENet:轻量化文档姿态估计网络

韩晶,吕学强,张祥祥,郝伟,张凯   

  1. 1.北京信息科技大学 网络文化与数字传播北京市重点实验室,北京 100101
    2.首都师范大学 中国语言智能研究中心,北京 100048
  • 出版日期:2022-11-15 发布日期:2022-11-15

DPENet:Lightweight Document Pose Estimation Network

HAN Jing, LYU Xueqiang, ZHANG Xiangxiang, HAO Wei, ZHANG Kai   

  1. 1.Beijing Key Laboratory of Internet Culture and Digital Dissemination, Beijing Information Science and Technology University, Beijing 100101, China
    2.Research Center for Language Intelligence of China, Capital Normal University, Beijing 100048, China
  • Online:2022-11-15 Published:2022-11-15

摘要: 现有的用于矫正透视倾斜变形文档的深度学习模型存在空间泛化性差、模型参数量大、推理速度慢等问题。从姿态估计的角度出发,提出一种轻量化文档姿态估计网络DPENet(lightweight document pose estimation network),以优化上述问题。将文档图像中的单一文档视为一个姿态估计对象,将文档的四个角点视为文档对象的四个姿态估计点,采用兼具全连接回归与高斯热图回归优点的DSNT(differentiable spatial to numerical transform)模块实现文档图像角点的高精度定位,并通过透视变换处理实现透视变形文档图像的高精度矫正。DPENet采用轻量化设计,以面向移动端的MobileNet V2为主干网络,模型体量只有10.6?MB。在SmartDoc-QA(仅取148张文档图像)数据集上与现有的三种主流网络进行了对比实验,实验结果表明,DPENet的矫正成功率(96.6%)和平均位移误差(mean displacement error,MDE)(1.28个像素)均优于其他三种网络,同时其平均矫正速度也有良好的表现。在保持轻量化和速度快的条件下,DPENet网络具有更高的变形文档矫正成功率和矫正精度。

关键词: 姿态估计, 深度学习, 文档图像矫正, 轻量化网络, MobileNet V2

Abstract: Existing deep learning models for perspective skewed deformation document correction processing have problems of large number of model parameters, slow inference speed and poor spatial generalization. This paper introduces a pose estimation algorithm and proposes a lightweight document pose estimation network(DPENet) to cover the weakness. The model treats a single document in a document image as a pose estimation object, and treats the four corner points of a document as four pose estimation points of the document object, and uses DSNT(differentiable spatial to numerical transform) to predict coordinates of four pose estimation points, which has advantages of both full connection regression and heatmap regression, and achieves high-precision localization of document images corner points, and implements high-precision correction of the perspective deformed document image by perspective transformation processing. DPENet adopts lightweight design which uses MobileNet V2 as the backbone network, so that DPENet has a small volume which is only 10.6 MB. Compared with three mainstream networks on SmartDoc-QA (148 images), the correction success rate (96.6%) and the mean displacement error(MDE) (1.28 pixels) of DPENet are better than the other three networks, while its average correction speed also has good performance. The DPENet has higher correction success rate and correction accuracy for deformed documents while maintaining light weight and fast speed.

Key words: pose estimation, deep learning, document image rectification, lightweight network, MobileNet V2