TextRail：复杂自然场景下的不规则文本检测算法

doi:10.3778/j.issn.1002-8331.2206-0506

摘要/Abstract

摘要： 文本检测是文本识别的前提和基础。复杂自然场景下，受透视、遮挡、变形等因素影响，图像质量难以保证，同时图像中的文字形式丰富多样，多呈不规则形状，加上复杂背景的干扰，致使文本检测难度大、精确度低。针对文本形状不规则的场景，提出了一种文本边轨模型（TextRail），该模型基于文本上、下边界基准点表示文本区域的思想，实现对任意形状文本的高效检测。TextRail使用全卷积网络（full convolutional network，FCN）及特征金字塔网络（feature pyramid network，FPN）提取文本图像特征；将特征送入检测头网络，实现文本区域上下边界基准点的预测，将预测结果通过位置感知非极大抑制（locality-aware non-maximum suppression，LNMS）合并，得到最终的上下边界基准点；采用薄板样条插值（thin plate spline，TPS）的方法实现对不规则文本的自动矫正。通过大量的实验验证，TextRail在F1分值上优于其他文本检测模型。同时TextRail模型可以准确表示出文字的朝向、弯曲和变形情况，有效提升了不规则文本检测的准确率和鲁棒性。

关键词: 复杂自然场景, 不规则文本检测, 文本矫正, 基准点, TextRail模型

Abstract: Text detection is a prerequisite for text recognition. In complicated natural scenarios, texts may be distorted, bent or in irregular shapes, the image is in poor quality. The texts that existed in images have abundant style, irregular shapes, and complex backgrounds, which make detection incorrect and have low recognition accuracy. For irregular text, an upper and lower fiducial point is used as the basis to construct the TextRail model, aiming to detect texts with any shapes effectively. Firstly, full convolutional network（FCN） and feature pyramid network（FPN） are used to extract features from images. Secondly, these features are sent to detection heads in order to predict the upper and lower boundaries of the text area. Locality-aware non-maximum suppression（LNMS） network is used to obtain the final fiducial points prediction. Finally, thin plate spline（TPS） is utilized to rectify bent texts based on the fiducial points. The results of experiments show that the F1-score of the proposed model is the best among all the models. This method can represent text orientation, bend, and distortion. Therefore, the proposed method can significantly increase the recognition accuracy and robustness when texts are bent or in different orientations.

Key words: complicated natural scenarios, irregular text detection, text rectification, fiducial point, TextRail model

马静, 薛浩, 郭小宇. TextRail：复杂自然场景下的不规则文本检测算法[J]. 计算机工程与应用, 2023, 59(21): 112-122.

MA Jing, XUE Hao, GUO Xiaoyu. TextRail：Irregular Text Detection Algorithm in Complicated Natural Scenarios[J]. Computer Engineering and Applications, 2023, 59(21): 112-122.

参考文献

[1] NEUMANN L，MATAS J.A method for text localization and recognition in real-world images[C]//Asian Conference on Computer Vision.Berlin，Heidelberg：Springer，2010：770-783.
[2] MATAS J，CHUM O，URBAN M，et al.Robust wide-baseline stereo from maximally stable extremal regions[J].Image and Vision Computing，2004，22（10）：761-767.
[3] EPSHTEIN B，OFEK E，WEXLER Y.Detecting text in natural scenes with stroke width transform[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition，2010：2963-2970.
[4] 李凯，艾斯卡尔·艾木都拉.基于边缘和基线的维吾尔文图像文字定位算法[J].计算机工程与应用，2014，50（10）：203-207.
LI K，ASKAR H.Edge and baseline detection algorithm for localization of Uyghur text in images[J].Computer Engineering and Applications，2014，50（10）：203-207.
[5] 易尧华，申春辉，刘菊华，等.结合MSCRs与MSERs的自然场景文本检测[J].中国图象图形学报，2017，22（2）：154-160.
YI Y H，SHEN C H，LIU J H，et al.Natural scene text detection method by integrating MSCRs into MSERs[J].Journal of Image and Graphics，2017，22（2）：154-160.
[6] WANG K，BELONGIE S.Word spotting in the wild[C]//European Conference on Computer Vision.Berlin，Heidelberg：Springer，2010：591-604.
[7] 易尧华，何婧婧，卢利琼，等.顾及目标关联的自然场景文本检测[J].中国图象图形学报，2020，25（1）：126-135.
YI Y H，HE J J，LU L Q，et al.Association of text and other objects for text detection with natural scene images[J].Journal of Image and Graphics，2020，25（1）：126-135.
[8] 王建新，王子亚，田萱.基于深度学习的自然场景文本检测与识别综述[J].软件学报，2020，31（5）：1465-1496.
WANG J X，WANG Z Y，TIAN X.Review of natural scene text detection and recognition based on deep learning[J].Journal of Software，2020，31（5）：1465-1496.
[9] GUPTA A，VEDALDI A，ZISSERMAN A.Synthetic data for text localisation in natural images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：2315-2324.
[10] LIAO M，SHI B，BAI X，et al.Textboxes：a fast text detector with a single deep neural network[C]//Thirty-first AAAI Conference on Artificial Intelligence，2017.
[11] LIAO M，SHI B，BAI X.Textboxes++：a single-shot oriented scene text detector[J].IEEE Transactions on Image Processing，2018，27（8）：3676-3690.
[12] TIAN Z，HUANG W，HE T，et al.Detecting text in natural image with connectionist text proposal network[C]//European Conference on Computer Vision.Cham：Springer，2016：56-72.
[13] SHI B，BAI X，BELONGIE S.Detecting oriented text in natural images by linking segments[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：2550-2558.
[14] ZHOU X，YAO C，WEN H，et al.East：an efficient and accurate scene text detector[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：5551-5560.
[15] BAEK Y，LEE B，HAN D，et al.Character region awareness for text detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：9365-9374.
[16] ZHANG Z，ZHANG C，SHEN W，et al.Multi-oriented text detection with fully convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：4159-4167.
[17]   DENG D，LIU H，LI X，et al.Pixellink：detecting scene text via instance segmentation[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2018.
[18]   WANG W，XIE E，LI X，et al.Shape robust text detection with progressive scale expansion network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：9336-9345.
[19]   LONG S，RUAN J，ZHANG W，et al.Textsnake：a flexible representation for detecting text of arbitrary shapes[C]//Proceedings of the European Conference on Computer Vision（ECCV），2018：20-36.
[20]   LIAO M，WAN Z，YAO C，et al.Real-time scene text detection with differentiable binarization[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2020：11474-11481.
[21]   WANG H，LU P，ZHANG H，et al.All you need is boundary：toward arbitrary-shaped text spotting[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2020，34（7）：12160-12167.
[22]   WANG F，CHEN Y，WU F，et al.Textray：contour-based geometric modeling for arbitrary-shaped scene text detection[C]//Proceedings of the 28th ACM International Conference on Multimedia，2020：111-119.
[23]   LIU Y，CHEN H，SHEN C，et al.Abcnet：real-time scene text spotting with adaptive bezier-curve network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：9809-9818.
[24]   LIU Y，SI C，JIN K，et al.FCENet：an instance segmentation model for extracting figures and captions from material documents[J].IEEE Access，2020，9：551-564.
[25]   BOOKSTEIN F L.Principal warps：thin-plate splines and the decomposition of deformations[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，1989，11（6）：567-585.
[26]   ZHU X，HU H，LIN S，et al.Deformable convnets v2：more deformable，better results[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：9308-9316.
[27]   JADERBERG M，SIMONYAN K，ZISSERMAN A.Spatial transformer networks[C]//Advances in Neural Information Processing Systems，2015.
[28]   SHI B，WANG X，LYU P，et al.Robust scene text recognition with automatic rectification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：4168-4176.
[29]   CH'NG C K，CHAN C S.Total-text：a comprehensive dataset for scene text detection and recognition[C]//2017 14th IAPR International Conference on Document Analysis and Recognition（ICDAR），2017：935-942.
[30]   LIU Y，JIN L，ZHANG S，et al.Curved scene text detection via transverse and longitudinal sequence connection[J].Pattern Recognition，2019，90：337-345.
[31]   VATTI B R.A generic solution to polygon clipping[J].Communications of the ACM，1992，35（7）：56-63.
[32]   TIAN Z，SHEN C，CHEN H，et al.Fcos：fully convolutional one-stage object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2019：9627-9636.
[33]   WANG W，XIE E，SONG X，et al.Efficient and accurate arbitrary-shaped text detection with pixel aggregation network[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2019：8440-8449.
[34]   ZHANG S X，ZHU X，HOU J B，et al.Deep relational reasoning graph network for arbitrary shape text detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：9699-9708.
[35]   WANG Y，XIE H，ZHA Z J，et al.Contournet：taking a further step toward accurate arbitrary-shaped scene text detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：11753-11762.
[36]   TANG J，YANG Z，WANG Y，et al.Seglink++：detecting dense and arbitrary-shaped scene text by instance-aware component grouping[J].Pattern Recognition，2019，96：106954.
[37]   LIAO M，ZOU Z，WAN Z，et al.Real-time scene text detection with differentiable binarization and adaptive scale fusion[J].arXiv：2202.10304v1，2022.