计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (2): 154-161.DOI: 10.3778/j.issn.1002-8331.2210-0077

• 模式识别与人工智能 • 上一篇    下一篇

多尺度池化和双向特征融合的场景文本检测

魏哲亮,李岳阳,罗海驰   

  1. 1.江南大学 人工智能与计算机学院,江苏 无锡 214122
    2.江南大学 物联网工程学院,江苏 无锡 214122
  • 出版日期:2024-01-15 发布日期:2024-01-15

Scene Text Detection Based on Multi-Scale Pooling and Bidirectional Feature Fusion

WEI Zheliang, LI Yueyang, LUO Haichi   

  1. 1.College of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu 214122, China
    2.College of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2024-01-15 Published:2024-01-15

摘要: 针对自然场景中文字背景复杂多样、形态大小各异的问题,提出了一种新的基于分割的场景文本检测网络。通过构建多尺度池化和双向特征融合两个模块来提升网络性能。根据文本实例的特点,多尺度池化模块使用不同长宽比窗口的空间池来捕获不同距离上文本信息的依赖关系,指导网络得到更加准确的分割结果。双向特征融合模块构建了两条不同方向的融合路径,以更好地利用主干网络的不同尺度特征,提升网络对不同尺度文本的检测性能。实验结果证明了所提方法的有效性,在ICDAR2015、MSRA-TD500和Total-Text这三个公开数据集上,分别取得了87.7%、86.7%和85.5%的F-measure值。

关键词: 文本检测, 图像分割, 多尺度池化, 双向特征融合

Abstract: Text has complex background, with different shapes and sizes in the natural scene. To solve this problem, a new scene text detection network based on segmentation is proposed. The network performance is improved by building two modules:multi-scale pooling and bidirectional feature fusion. According to the characteristics of text instances, the multi-scale pooling module uses spatial pooling with different aspect ratios window to capture the dependency of text information at different distances, which guides the network to obtain more accurate segmentation results. The bidirectional feature fusion module constructs two fusion paths in different directions to better utilize the different scale features of the backbone network and improve the network’s detection performance for texts of different scales. The experimental results prove the effectiveness of the proposed method. On the ICDAR2015, MSRA-TD500 and Total-Text three open data sets, 87.7%, 86.7% and 85.5% F-measure values are obtained respectively.

Key words: text detection, image segmentation, multi-scale pooling, bidirectional feature fusion