基于高阶图卷积推理网络的任意形状文本检测

doi:10.3778/j.issn.1002-8331.2208-0247

摘要/Abstract

摘要： 通用场景文本检测被广泛应用于地图导航、无人驾驶等多个领域。场景文本方向各异且形状复杂多变，使得文本检测难度大。针对这一问题，提出一种高阶图卷积推理网络。以文本检测框架DRRG为基础，设计高阶图方案，提出高阶图卷积推理网络，扩展了推理范围，有效组合高阶邻居提供的辅助信息。改进一阶邻居的设置，降低无关组件的干扰，提高了反向传播和组件链接的效率。引入SE聚合模块为每个节点独立且自适应地生成聚合方案，进一步提高了对高阶信息的利用率。实验结果表明，改进后的网络在Total-Text、CTW-1500和ICDAR2015数据集上的平均精度（F1）分别提升了1.4、1.05和1.26个百分点。

关键词: 图像处理, 文本检测, 高阶图卷积网络, 关系推理网络, SE聚合

Abstract: General scene text detection is widely used in many fields, such as map navigation, driverless and so on. Scene text has different directions and complex shapes, which makes text detection difficult. To solve this problem, it puts forward a kind of high-order graph convolution relation reasoning network. Firstly, it designs the scheme of high-order graph based on the text detection framework DRRG, and proposes the reasoning network of high-order graph convolution, which expands the reasoning range and effectively combines the assistant information provided by high-order neighbors. Secondly, it makes better the setting of first-order neighbors to reduce the interference of irrelevant components, and improves the efficiency of back-propagation and component link. Finally, the SE- aggregation module is introduced to generate aggregation scheme independently and adaptively for each node, which further improves the utilization of high-order information. The experimental results show that the average accuracy (F1) of the improved network on the Total-Text, CTW-1500 and ICDAR2015 datasets is improved by 1.4, 1.05 and 1.26 percentage points respectively.

Key words: image processing, text detection, high-order graph convolutional network, relational reasoning network, SE-aggregation

刘平, 姜永峰, 张良. 基于高阶图卷积推理网络的任意形状文本检测[J]. 计算机工程与应用, 2024, 60(1): 263-270.

LIU Ping, JIANG Yongfeng, ZHANG Liang. Arbitrary Shape Text Detection Based on High-Order Graph Convolution Reasoning Network[J]. Computer Engineering and Applications, 2024, 60(1): 263-270.

参考文献

[1] LI Y, IBRAYIM M, HAMDULLA A. Summary of scene text detection based on deep learning[C]//2021 International Conference of Social Computing and Digital Economy (ICSCDE), Chongqing, China, May 14-16, 2021: 318-323.
[2] LIAO M H, SHI B G, BAI X. TextBoxes++: a single-shot oriented scene text detector[J]. IEEE Transactions on Image Processing, 2018, 27(8): 3676-3690.
[3] JIANG X, XU S, ZHANG S, et al. Arbitrary-shaped text detection with adaptive text region representation[J]. IEEE Access, 2018, 8: 102106-102118.
[4] LIU Y, CHEN H, SHEN C, et al. ABCNet: real-time scene text spotting with adaptive Bezier-curve network[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, June 13-19, 2020: 9806-9815.
[5] LIU Y, SHEN C H, JIN L W, et al. ABCNet v2: adaptive Bezier-curve network for real-time end-to-end text spotting[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(11): 8048-8064.
[6] ZHANG X, SU Y W, TRIPATHI S, et al. Text spotting transformers[EB/OL]. [2022-07-10]. https://arxiv.org/abs/2204. 01918.
[7] HUANG M X, LIU Y L, PENG Z G, et al. Swintextspotter: scene text spotting via better synergy between text detection and text recognition[EB/OL]. [2022-07-10]. https://arxiv.org/abs/2203.10209.
[8] ZHOU X Y, YAO C, WEN H, et al. EAST: an efficient and accurate scene text detector[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, July 21-26, 2017: 2642-2651.
[9] DENG D, LIU H F, LI X L, et al. PixelLink: detecting scene text via instance segmentation[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2018: 6773-6780.
[10] XU Y C, WANG Y K, ZHOU W, et al. TextField: learning a deep direction field for irregular scene text detection[J]. IEEE Transactions on Image Processing, 2019, 28(11): 5566-5579.
[11] WANG P, ZHANG C, QI F, et al. PGNET: real-time arbitrarily-shaped text spotting with point gathering network[C]//Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Vancouver, British Columbia, Canada, Feb 2-9, 2021: 2782-2790.
[12] LIAO M H, PANG G, HUANG J, et al. Mask textspotter v3: segmentation proposal network for robust scene text spotting[C]//Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, Aug 23-28, 2020: 706-722.
[13] CHEN Y, QIAO L, CHENG Z Z, et al. Dynamic low-resolution distillation for cost-efficient end-to-end text spotting[EB/OL]. [2022-07-10]. https://arxiv.org/abs/2207.06694.
[14] SHI B G, BAI X, BELONGIE S. Detecting oriented text in natural images by linking segments[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, July 21-26, 2017: 3482-3490.
[15] FENG W, HE W H, YIN F, et al. TextDragon: an end-to-end framework for arbitrary shaped text spotting[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), October 27-November 02, 2019: 9075-9084.
[16] ZHANG S X, ZHU X B, HOU J B, et al. Deep relational reasoning graph network for arbitrary shape text detection[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, June 13-19, 2020: 9696-9705.
[17] BRUNA J, ZAREMBA W, SZLAM A, et al. Spectral networks and locally connected networks on graphs[EB/OL]. [2022-07-10]. https: //arxiv. org/ abs/1312. 6203.
[18] DEFFERRARD M, BRESSON X, VANDERGHEYNST P. Convolutional neural networks on graphs with fast localized spectral filtering[C]//International Neural Information Processing Systems, Red Hook, NY, USA, December 5-10, 2016: 3844-3852.
[19] KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[EB/OL]. [2022-07-10]. https://arxiv.org/abs/1609.02907.
[20] VELICKOVIC P, CUCURULL G, CASANOVA A, et al. Graph attention networks[C]//International Conference on Learning Representations, Vancouver, BC, Canada, Apr 30-May 3, 2018: 1-12.
[21] HAMILTON W L, YING R, LESKOVEC J. Inductive representation learning on large graphs[EB/OL]. [2022-07-10]. https://arxiv.org/abs/1706.02216.
[22] CHEN S J, LI Z X, HUANG F C, et al. Object detection using dual graph network[C]//2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, January 10-15, 2021: 3280-3287.
[23] SHU X, LIU R, XU J. A semantic relation graph reasoning network for object detection[C]//2021 IEEE 10th Data Driven Control and Learning Systems Conference, Suzhou, China, May 14-16, 2021: 1309-1314.
[24] 卢光曦. 基于图神经网络的目标检测与识别算法研究[D]. 成都: 电子科技大学, 2021.
LU G X. Research on image detection and recognition algorithm based on graph neural networks[D]. Chengdu: University of Electronic Science and Technology of China, 2021.
[25] ZHANG T Q, WU Q T, YAN J C. Learning high-order graph convolutional networks via adaptive layerwise aggregation combination[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(8): 5144-5155.
[26] MA J Q, SHAO W Y, YE H, et al. Arbitrary-oriented scene text detection via rotation proposals[J]. IEEE Transactions on Multimedia, 2018, 20(11): 3111-3122.
[27] WANG Z D, ZHENG L, LI Y L, et al. Linkage based face clustering via graph convolution network[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, June 15-20 2019: 1117-1125.
[28] GUPTA A, VEDALDI A, ZISSERMAN A. Synthetic data for text localisation in natural images[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, June 27-30, 2016: 2315-2324.
[29] HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, June 18-23, 2018: 7132-7141.
[30] WANG Y, XIE H, Z. ZHA J, et al. Contournet: taking a further step toward accurate arbitrary-shaped scene text detection[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, June 13-19, 2020: 11750-11759.
[31] TANG J, YANG Z, WANG Y, et al. Seglink++: detecting dense and arbitrary-shaped scene text by instance-aware component grouping[J]. Pattern Recognition, 2019, 96: 106954.