改进FCENet的自然场景文本检测算法

doi:10.3778/j.issn.1002-8331.2209-0043

摘要/Abstract

摘要： 针对自然场景文本检测中由于背景复杂、尺度多变、形状弯曲等造成的检测难题，提出了一种改进FCENet（Fourier contour embedding network）的场景文本检测算法。该算法基于FCENet并引入了多尺度残差特征增强模块和多尺度注意力特征融合模块。多尺度残差特征增强模块作为骨干网络顶层的残差分支，增强了特征金字塔结构自上而下的高层语义信息流动，提高了文本像素分类能力，有效减少误检现象。多尺度注意力特征融合模块使不同语义和尺度的特征能够更好地融合，结合自底向上的特征融合网络，有效避免文本过度分割并提高了弯曲文本的检测能力。实验结果表明，该方法在弯曲文本数据集CTW1500和Total-Text上的综合指标F值分别达到了86.2%和86.5%，相比原算法FCENet分别提升了1.1和0.7个百分点。

关键词: 自然场景文本检测, 特征融合, 特征增强, 注意力机制, FCENet

Abstract: Aiming at the detection problems caused by complex background, variable scale and curved shape in natural scene text detection, this paper proposes an improved FCENet (Fourier contour embedding network) scene text detection algorithm. The algorithm is based on FCENet and introduces a multi-scale residual feature enhancement module and a multi-scale attention feature fusion module. As the residual branch at the top of the backbone network, the multi-scale residual feature enhancement module enhances the high-level semantic information flow from top to bottom of the feature pyramid structure, improves the text pixel classification ability, and effectively reduces the false detection phenomenon. The multi-scale attention feature fusion module enables features of different semantics and scales to be better fused. Combined with the bottom-up feature fusion network, it effectively avoids text over-segmentation and improves the detection ability of curved text. Experimental results show that the comprehensive index F-measure of the proposed method on the curved text datasets CTW1500 and Total-Text reaches 86.2% and 86.5%, respectively, which is 1.1 and 0.7 percentage points higher than the original algorithm FCENet.

Key words: scene text detection, feature fusion, feature enhancement, attention mechanism, Fourier contour embedding network (FCENet)

周燕, 廖俊玮, 刘翔宇, 周月霞, 曾凡智. 改进FCENet的自然场景文本检测算法[J]. 计算机工程与应用, 2024, 60(3): 228-236.

ZHOU Yan, LIAO Junwei, LIU Xiangyu, ZHOU Yuexia, ZENG Fanzhi. Improved FCENet Algorithm for Natural Scene Text Detection[J]. Computer Engineering and Applications, 2024, 60(3): 228-236.

参考文献

[1] 刘艳菊, 伊鑫海, 李炎阁, 等. 深度学习在场景文字识别技术中的应用综述[J]. 计算机工程与应用, 2022, 58(4): 52-63.
LIU Y J, YI X H, LI Y G, et al. Application of scene text recognition technology based on deep learning: a survey[J]. Computer Engineering and Applications, 2022, 58(4): 52-63.
[2] REDMON J, FARHADI A. Yolov3: an incremental improvement[J]. arXiv:1804.02767, 2018.
[3] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//European Conference on Computer Vision. Cham: Springer, 2016: 21-37.
[4] ZHOU X, YAO C, WEN H, et al. EAST: an efficient and accurate scene text detector[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[5] 杨锶齐, 易尧华, 汤梓伟, 等. 嵌入注意力机制的自然场景文本检测方法[J]. 计算机工程与应用, 2021, 57(24): 185-191.
YANG S Q, YI Y H, TANG Z W, et al. Text detection in natural scenes embedded attention mechanism[J]. Computer Engineering and Applications, 2021, 57(24): 185-191.
[6] LIAO M, ZHU Z, SHI B, et al. Rotation-sensitive regression for oriented scene text detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 5909-5918.
[7] LIU Y, JIN L. Deep matching prior network: toward tighter multi-oriented text detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 1962-1969.
[8] WANG W, XIE E, LI X, et al. Shape robust text detection with progressive scale expansion network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 9336-9345.
[9] TIAN Z, SHU M, LYU P, et al. Learning shape-aware embedding for scene text detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 4234-4243.
[10] LIAO M, WAN Z, YAO C, et al. Real-time scene text detection with differentiable binarization[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 11474-11481.
[11] 骆文莉, 吴秦. 多层次特征融合与注意力机制的文本检测[J]. 小型微型计算机系统, 2022, 43(4): 815-821.
LUO W L, WU Q. Text detection based on multi-level feature fusion and attention mechanism[J]. Journal of Chinese Computer Systems, 2022, 43(4): 815-821.
[12] 王延昭, 顾晓东. 注意力机制在自然场景文字检测中的应用[J]. 计算机辅助设计与图形学学报, 2021, 33(12): 1908-1915.
WANG Y Z, GU X D. Using of attention for scene text detection[J]. Journal of Computer-Aided Design & Computer Graphics, 2021, 33(12): 1908-1915.
[13] ZHU Y, CHEN J, LIANG L, et al. Fourier contour embedding for arbitrary-shaped text detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 3123-3131.
[14] LIU Y L, JIN L W, ZHANG S T, et al. Detecting curve text in the wild: new dataset and new solution[J]. arXiv:1712. 02170, 2017.
[15] WOO S, PARK J, LEE J Y, et al. Cbam: convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision (ECCV), 2018: 3-19.
[16] WANG Q, WU B, ZHU P, et al. Supplementary material for “ECA-Net: efficient channel attention for deep convolutional neural networks”[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE, 2020: 13-19.
[17] LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 8759-8768.
[18] DAI Y, GIESEKE F, OEHMCKE S, et al. Attentional feature fusion[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021: 3560-3569.
[19] KARATZAS D, GOMEZ-BIGORDA L, NICOLAOU A, et al. ICDAR 2015 competition on robust reading[C]//2015 13th International Conference on Document Analysis and Recognition (ICDAR), 2015: 1156-1160.
[20] CH'NG C K, CHAN C S. Total-text: a comprehensive dataset for scene text detection and recognition[C]//2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 2017: 935-942.
[21] ZHU X, HU H, LIN S, et al. Deformable convnets v2: more deformable, better results[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 9308-9316.
[22] FENG W, HE W, YIN F, et al. Textdragon: an end-to-end framework for arbitrary shaped text spotting[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 9076-9085.
[23] SHRIVASTAVA A, GUPTA A, GIRSHICK R. Training region-based object detectors with online hard example mining[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 761-769.
[24] GUPTA A, VEDALDI A, ZISSERMAN A. Synthetic data for text localisation in natural images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 2315-2324.
[25] NAYEF N, YIN F, BIZID I, et al. ICDAR 2017 robust reading challenge on multi-lingual scene text detection and script identification-RRC-MLT[C]//2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 2017: 1454-1459.
[26] ZHANG S X, ZHU X, HOU J B, et al. Deep relational reasoning graph network for arbitrary shape text detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 9699-9708.
[27] WANG Y, XIE H, ZHA Z J, et al. Contournet: taking a further step toward accurate arbitrary-shaped scene text detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 11753-11762.
[28] ZHAO Y, CAI Y, WU W, et al. Explore faster localization learning for scene text detection[J]. arXiv:2207.01342, 2022.
[29] LIAO M, ZOU Z, WAN Z, et al. Real-time scene text detection with differentiable binarization and adaptive scale fusion[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(1): 919-931.
[30] ZHANG S, LIU Y, JIN L, et al. OPMP: an omnidirectional pyramid mask proposal network for arbitrary-shape scene text detection[J]. IEEE Transactions on Multimedia, 2020, 23: 454-467.
[31] MA C, SUN L, ZHONG Z, et al. ReLaText: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks[J]. Pattern Recognition, 2021, 111: 107684.
[32] QIN X, ZHOU Y, GUO Y, et al. Mask is all you need: rethinking mask R-CNN for dense and arbitrary-shaped scene text detection[C]//Proceedings of the 29th ACM International Conference on Multimedia, 2021: 414-423.
[33] DAI P, ZHANG S, ZHANG H, et al. Progressive contour regression for arbitrary-shape scene text detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 7393-7402.
[34] SU Y, SHAO Z, ZHOU Y, et al. TextDCT: arbitrary-shaped text detection via discrete cosine transform mask[J]. IEEE Transactions on Multimedia, 2022: 250072914.
[35] 邵海琳, 季怡, 刘纯平, 等. 基于增强特征金字塔网络的场景文本检测算法[J]. 计算机科学, 2022, 49(2): 248-255.
SHAO H L, JI Y, LIU C P, et al. Scene text detection algorithm based on enhanced feature pyramid network[J]. Computer Science, 2022, 49(2): 248-255.
[36] ZHANG S X, ZHU X, HOU J B, et al. Kernel proposal network for arbitrary shape text detection[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(11): 8731-8742.