文本核重建与扩展实现任意形状文本检测

doi:10.3778/j.issn.1002-8331.2301-0074

摘要/Abstract

摘要： 基于分割的方法对自然场景中的文本进行像素级预测，大幅度提升了对任意形状文本的检测效果，但是如何有效分离相邻文本仍然是检测中的难题。目前广泛采用的方法是通过缩小文本注释边界得到文本核来分离相邻文本。然而，网络预测文本核时舍弃了文本核外大部分信息，降低了基于分割的文本检测方法的性能。为了解决这个问题，提出了一种文本核重建算法，将文本核的生成放在后处理阶段，通过网络预测的方向场将文本实例向内收缩形成文本核。同时，提出了一种文本核扩展算法用于将文本核恢复为完整的文本实例。实验表明，所提方法在Total-Text（88.66%）、CTW-1500（87.28%）和MSRA-TD500（90.65%）三个数据集上取得了相似或最好的检测性能。

关键词: 场景文本检测, 任意形状, 文本核

Abstract: Segmentation-based methods approaches for pixel-level text prediction in natural scenes have demonstrated significant improvement in the detection of arbitrary shape text. However, the separation of adjacent text remains a challenge in text detection. One common method for addressing this issue involves the use of text kernels, which are obtained by shrinking the annotation boundaries, to separate adjacent instances. While this approach is effective in certain scenarios, it discards a significant amount of information outside the text kernel, which can degrade the performance of segmentation-based text detection methods. To address this limitation, a text kernel reconstruction algorithm is proposed that postpones the generation of text kernels to the post-processing stage. The proposed approach utilizes the direction field predicted by the network to inwardly contract text instances, resulting in the formation of text kernels. Additionally, a text kernel expansion algorithm is proposed to restore full text instances from the resulting text kernels. Experiments on the Total-Text, CTW-1500, and MSRA-TD500 datasets show that the proposed method achieves similar or superior detection performance compared to the state-of-the-art (88.66%, 87.28%, and 90.65% respectively).

Key words: scene text detection, arbitrary shape, text kernel

邓胜军, 陈念年. 文本核重建与扩展实现任意形状文本检测[J]. 计算机工程与应用, 2024, 60(9): 228-236.

DENG Shengjun, CHEN Niannian. Text Kernel Reconstruction and Expansion for Arbitrary Shape Text Detection[J]. Computer Engineering and Applications, 2024, 60(9): 228-236.

参考文献

[1] MAFLA A, DEY S, BITEN A F, et al. Multi-modal reasoning graph for scene-text based fine-grained image classification and retrieval[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021: 4023-4033.
[2] KANG C, KIM G, YOO S. Detection and recognition of text embedded in online images via neural context models[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2017.
[3] ZHU Y, LIAO M, YANG M, et al. Cascaded segmentation-detection networks for text-based traffic sign detection[J]. IEEE Transactions on Intelligent Transportation Systems, 2017, 19(1): 209-219.
[4] ZHOU X, YAO C, WEN H, et al. East: an efficient and accurate scene text detector[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 5551-5560.
[5] LIAO M, SHI B, BAI X. Textboxes++: a single-shot oriented scene text detector[J]. IEEE Transactions on Image Processing, 2018, 27(8): 3676-3690.
[6] YULIANG L, LIANWEN J, SHUAITAO Z, et al. Detecting curve text in the wild: new dataset and new solution[J]. arXiv:1712.02170, 2017.
[7] CH’NG C K, CHAN C S. Total-text: a comprehensive dataset for scene text detection and recognition[C]//2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 2017: 935-942.
[8] XU Y, WANG Y, ZHOU W, et al. Textfield: learning a deep direction field for irregular scene text detection[J]. IEEE Transactions on Image Processing, 2019, 28(11): 5566-5579.
[9] ZHU Y, DU J. Textmountain: accurate scene text detection via instance segmentation[J]. Pattern Recognition, 2021, 110: 107336.
[10] WANG W, XIE E, LI X, et al. Shape robust text detection with progressive scale expansion network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 9336-9345.
[11] WANG W, XIE E, SONG X, et al. Efficient and accurate arbitrary-shaped text detection with pixel aggregation network[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 8440-8449.
[12] LIAO M, WAN Z, YAO C, et al. Real-time scene text detection with differentiable binarization[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 11474-11481.
[13] LIAO M, ZOU Z, WAN Z, et al. Real-time scene text detection with differentiable binarization and adaptive scale fusion[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(1): 919-931.
[14] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2117-2125.
[15] HE K, ZHANG X, REN S, et al. Identity mappings in deep residual networks[C]//European Conference on Computer Vision, 2016: 630-645.
[16] DAI J, QI H, XIONG Y, et al. Deformable convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 764-773.
[17] ZHU X, HU H, LIN S, et al. Deformable convnets v2: more deformable, better results[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 9308-9316.
[18] BORGEFORS G. Distance transformations in arbitrary dimensions[J]. Computer Vision, Graphics, and Image Processing, 1984, 27(3): 321-345.
[19] ZHANG S X, ZHU X, HOU J B, et al. Deep relational reasoning graph network for arbitrary shape text detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 9699-9708.
[20] SHENG T, CHEN J, LIAN Z. Centripetaltext: an efficient text instance representation for scene text detection[C]//Advances in Neural Information Processing Systems, 2021: 335-346.
[21] SHRIVASTAVA A, GUPTA A, GIRSHICK R. Training region-based object detectors with online hard example mining[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 761-769.
[22] GUPTA A, VEDALDI A, ZISSERMAN A. Synthetic data for text localisation in natural images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 2315-2324.
[23] YAO C, BAI X, LIU W, et al. Detecting texts of arbitrary orientations in natural images[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012: 1083-1090.
[24] LONG S, RUAN J, ZHANG W, et al. Textsnake: a flexible representation for detecting text of arbitrary shapes[C]//Proceedings of the European Conference on Computer Vision (ECCV), 2018: 20-36.
[25] YAO C, BAI X, LIU W. A unified framework for multioriented text detection and recognition[J]. IEEE Transactions on Image Processing, 2014, 23(11): 4737-4749.
[26] NAYEF N, YIN F, BIZID I, et al. Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-RRC-MLT[C]//2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 2017: 1454-1459.
[27] DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
[28] KINGMA D P, BA J. Adam: a method for stochastic optimization[J]. arXiv:1412.6980, 2014.
[29] SMITH L N, TOPIN N. Super-convergence: very fast training of neural networks using large learning rates[C]//Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, 2019: 369-386.
[30] VATTI B R. A generic solution to polygon clipping[J]. Communications of the ACM, 1992, 35(7): 56-63.
[31] ZHANG S X, ZHU X, YANG C, et al. Adaptive boundary proposal network for arbitrary shape text detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 1305-1314.
[32] ZHANG S X, ZHU X, CHEN L, et al. Arbitrary shape text detection via segmentation with probability maps[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(1): 2736-2750.