Computer Engineering and Applications ›› 2024, Vol. 60 ›› Issue (3): 228-236.DOI: 10.3778/j.issn.1002-8331.2209-0043

• Graphics and Image Processing • Previous Articles     Next Articles

Improved FCENet Algorithm for Natural Scene Text Detection

ZHOU Yan, LIAO Junwei, LIU Xiangyu, ZHOU Yuexia, ZENG Fanzhi   

  1. Department of Computer Science, Foshan University, Foshan, Guangdong 528000, China
  • Online:2024-02-01 Published:2024-02-01

改进FCENet的自然场景文本检测算法

周燕,廖俊玮,刘翔宇,周月霞,曾凡智   

  1. 佛山科学技术学院 计算机系,广东 佛山 528000

Abstract: Aiming at the detection problems caused by complex background, variable scale and curved shape in natural scene text detection, this paper proposes an improved FCENet (Fourier contour embedding network) scene text detection algorithm. The algorithm is based on FCENet and introduces a multi-scale residual feature enhancement module and a multi-scale attention feature fusion module. As the residual branch at the top of the backbone network, the multi-scale residual feature enhancement module enhances the high-level semantic information flow from top to bottom of the feature pyramid structure, improves the text pixel classification ability, and effectively reduces the false detection phenomenon. The multi-scale attention feature fusion module enables features of different semantics and scales to be better fused. Combined with the bottom-up feature fusion network, it effectively avoids text over-segmentation and improves the detection ability of curved text. Experimental results show that the comprehensive index F-measure of the proposed method on the curved text datasets CTW1500 and Total-Text reaches 86.2% and 86.5%, respectively, which is 1.1 and 0.7 percentage points higher than the original algorithm FCENet.

Key words: scene text detection, feature fusion, feature enhancement, attention mechanism, Fourier contour embedding network (FCENet)

摘要: 针对自然场景文本检测中由于背景复杂、尺度多变、形状弯曲等造成的检测难题,提出了一种改进FCENet(Fourier contour embedding network)的场景文本检测算法。该算法基于FCENet并引入了多尺度残差特征增强模块和多尺度注意力特征融合模块。多尺度残差特征增强模块作为骨干网络顶层的残差分支,增强了特征金字塔结构自上而下的高层语义信息流动,提高了文本像素分类能力,有效减少误检现象。多尺度注意力特征融合模块使不同语义和尺度的特征能够更好地融合,结合自底向上的特征融合网络,有效避免文本过度分割并提高了弯曲文本的检测能力。实验结果表明,该方法在弯曲文本数据集CTW1500和Total-Text上的综合指标F值分别达到了86.2%和86.5%,相比原算法FCENet分别提升了1.1和0.7个百分点。

关键词: 自然场景文本检测, 特征融合, 特征增强, 注意力机制, FCENet