计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (5): 250-260.DOI: 10.3778/j.issn.1002-8331.2310-0243

• 图形图像处理 • 上一篇    下一篇

用于场景文本检测的非对称迭代细化预测网络

连哲,殷雁君,米增,智敏,徐巧枝   

  1. 内蒙古师范大学 计算机科学技术学院,呼和浩特 010022
  • 出版日期:2025-03-01 发布日期:2025-03-01

Asymmetric Iterative Refinement Prediction Network for Scene Text Detection

LIAN Zhe, YIN Yanjun, MI Zeng, ZHI Min, XU Qiaozhi   

  1. School of Computer Science and Technology, Inner Mongolia Normal University, Hohhot 010022, China
  • Online:2025-03-01 Published:2025-03-01

摘要: 场景文本检测是图像处理领域的基础性研究工作,具有广泛的应用价值。DBNet作为该领域具有代表性的算法,重构文本实例的后处理过程过于简单,对纵横比显著变化的文本容易误检以及对小文本容易漏检。为解决以上问题,设计并提出用于场景文本检测的非对称迭代细化预测网络AIRPNet。模型基于ResNet50特征提取网络,将常规卷积替换为可变形卷积以适应不规则文本特征,并调整block堆叠数使得各层携带的特征更加合理。采用RFP的递归思想更充分地融合多层特征,设计非对称迭代细化预测模块构建更为准确的概率图,可微分二值化后处理重构文本实例边界。针对非对称迭代细化预测模块,设计多种结构进行探究。为评估提出模型的有效性,在三个数据集上与最先进的主流模型进行对比,在ICDAR2015、MSRA-TD500和CTW1500数据集中,分别取得88.7%、88.4%和84.9%的F-measure,实现或接近SOTA性能。实验结果表明,提出模型能够捕获较为准确的概率图,从而构建较为完整的文本边界框。

关键词: 文本检测, 递归金字塔, 非对称卷积, 迭代细化预测, 可微分二值化

Abstract: Scene text detection is a fundamental research work in the field of image processing, which has a wide range of application value. As a representative algorithm in this field, DBNet has a problem that the post-processing of reconstructed text instances is too simple, and it is easy to misdetect the text with a significant change in aspect ratio as well as easy to miss the detection of small text. In order to solve the above problems, AIRPNet, an asymmetric iterative refinement prediction network for scene text detection, is designed and proposed. The model is based on ResNet50 feature extraction network, which replaces the regular convolution with deformable convolution to adapt to the irregular text features and adjusts the number of block stacks to make the features carried by each layer more reasonable. The recursive idea of RFP is used to integrate the multi-layer features more fully, and the asymmetric iterative refinement prediction module is designed to construct more accurate probability maps, and the text instance boundaries are reconstructed by differentiable binarization post-processing. For the asymmetric iterative refinement prediction module, various structures are designed for exploration. To evaluate the effectiveness of the proposed model, it is compared with the state-of-the-art mainstream models on three datasets, and 88.7%, 88.4%, and 84.9% of F-measure is achieved in the ICDAR2015, MSRA-TD500, and CTW1500 datasets, respectively, realizing or approaching the SOTA performance. The experimental results show that the proposed model is able to capture more accurate probability maps and thus construct more complete text bounding boxes.

Key words: text detection, recursive pyramid, asymmetric convolution, iterative refinement prediction, differentiable binarization