文本中心像素重建实现任意形状的文本检测

doi:10.3778/j.issn.1002-8331.2112-0108

摘要/Abstract

摘要： 针对自然场景文本检测算法未能高效、准确地实现端到端的任意形状文本检测，提出了轻量型像素聚类文本核重建的文本检测算法，针对轻量型网络特征信息弱和感受野小的问题，设计了图像级上下文信息模块（image-level context module）来捕获全局图像信息和语义级上下文信息模块（semantic-level context module）学习目标区域信息，两者信息融合增强网络特征信息保证检测的准确性，为了有效区分相邻文本和定位弯曲文本，基于文本核启发将文字实例中心视为聚类中心，从核中心经过一次像素聚类重建完整的文字实例实现对任意形状文本的检测。方法在弯曲文本数据集Total-Text和CTW1500综合评分达到了84.1%和84.6%超过了最好的CARFT方法，检测速度42帧/s超过最优EAST的，有效地解决了检测形状文本的高效和准确性，在应用层面更加友好。

关键词: 图像级上下文, 语义级上下文, 像素聚类, 任意形状文本, 文本核

Abstract: In order to solve that the scene text detection algorithm is not efficient and accurate to realize the end-to-end arbitrary shape text detection, a text detection algorithm based on lightweight pixel clustering text kernel reconstruction is proposed. To address the weak feature information and small sensing field of lightweight network, the image-level context module is designed to capture the global image information and the semantic-level context module is designed to learn the target region information. The fusion of the two information enhances the network feature information to ensure the accuracy of detection. In order to distinguish the adjacent text and locate the curved text effectively, the text instance center is regarded as the clustering center based on the text kernel heuristic, and the detection of arbitrary shape text is realized through a pixel clustering reconstruction of the complete text instance from the core center. The method achieves a comprehensive score of 84.1% and 84.6% in the Total-Text and CTW1500 curved text datasets, and the detection speed of 42 frame/s exceeds that of the optimal EAST, which effectively solves the problem of high efficiency and accuracy in the detection of shape text.

Key words: image-level context, semantic-level context, pixel aggregation, arbitrary shape text, kenerl method

雷小唐, 胡靖. 文本中心像素重建实现任意形状的文本检测[J]. 计算机工程与应用, 2023, 59(8): 148-156.

LEI Xiaotang, HU Jing. Text Center Pixel Reconstruction to Achieve Efficient Arbitrary Shape Text Detection[J]. Computer Engineering and Applications, 2023, 59(8): 148-156.

参考文献

[1] 司飞.自然场景图片中的文本检测和定位[J].电子技术与软件工程，2020（2）：147-149.
SI Fei.Text detection and location in natural scene pictures[J].Electronic Technology and Software Engineering，2020（2）：147-149.
[2] ZHU Y，YAO C，BAI X.Scene text detection and recognition：recent advances and future trends[J].Frontiers of Computer Science，2016，10：19-36.
[3] LIAO M，SHI B，BAI X，et al.Textboxes：a fast text detector with a single deep neural network[C]//Thirty-first AAAI Conference on Artificial Intelligence，2017.
[4] LIAO M，SHI B，BAI X.Textboxes++：a single-shot oriented scene text detector[J].IEEE Transactions on Image Processing，2018，27（8）：3676-3690.
[5] ZENG N，WU P，WANG Z，et al.A small-sized object detection oriented multi-scale feature fusion approach with application to defect detection[J].IEEE Transactions on Instrumentation and Measurement，2022，71：1-14.
[6] LIAO M，ZHU Z，SHI B，et al.Rotation-sensitive regression for oriented scene text detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018.
[7] LIU W，ANGUELOV D，ERHAN D，et al.SSD：single shot multibox detector[C]//European Conference on Computer Vision.Cham：Springer，2016：21-37.
[8] MA J，SHAO W，YE H，et al.Arbitrary-oriented scene text detection via rotation proposals[J].IEEE Transactions on Multimedia，2018，20（11）：3111-3122.
[9] REN S，HE K，GIRSHICK R，et al.Faster R-CNN：towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2016，39（6）：1137-1149.
[10] ZHOU X，YAO C，WEN H，et al.East：an efficient and accurate scene text detector[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：5551-5560.
[11] BODLA N，SINGH B，CHELLAPPA R，et al.Soft-NMS--improving object detection with one line of code[C]//Proceedings of the IEEE International Conference on Computer Vision，2017：5561-5569.
[12] XIE E，ZANG Y，SHAO S，et al.Scene text detection with supervised pyramid context network[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2019，33（1）：9038-9045.
[13] HE K，GKIOXARI G，DOLLáR P，et al.Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision，2017：2961-2969.
[14] LIAO M，PANG G，HUANG J，et al.Mask textspotter v3：segmentation proposal network for robust scene text spotting[C]//Computer Vision-ECCV 2020：16th European Conference，Glasgow，UK，August 23-28，2020：706-722.
[15] LIU Z，LIN G，YANG S，et al.Towards robust curve text detection with conditional spatial expansion[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：7269-7278.
[16] CH'NG C K，CHAN C S.Total-text：a comprehensive dataset for scene text detection and recognition[C]//2017 14th IAPR International Conference on Document Analysis and Recognition（ICDAR），2017：935-942.
[17] LI J，ZHANG C，SUN Y，et al.Detecting text in the wild with deep character embedding network[C]//Computer Vision-ACCV 2018：14th Asian Conference on Computer Vision，Perth，Australia，2019：501-517.
[18] KARATZAS D，GOMEZ-BIGORDA L，NICOLAOU A，et al.ICDAR 2015 competition on robust reading[C]//2015 13th International Conference on Document Analysis and Recognition（ICDAR），2015：1156-1160.
[19] YAO C，BAI X，LIU W，et al.Detecting texts of arbitrary orientations in natural images[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition，2012：1083-1090.
[20] HE K，ZHANG X，REN S，et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：770-778.
[21] WANG W，XIE E，SONG X，et al.Efficient and accurate arbitrary-shaped text detection with pixel aggregation network[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2019：8440-8449.
[22] CHEN L C，YANG Y，WANG J，et al.Attention to scale：scale-aware semantic image segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：3640-3649.
[23] ZHAO H，SHI J，QI X，et al.Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：2881-2890.
[24] YUAN Y，CHEN X，WANG J.Object-contextual representations for semantic segmentation[C]//Computer Vision-ECCV 2020：16th European Conference，Glasgow，UK，August 23-28，2020，2020：173-190.
[25] LI X，WANG W，HOU W，et al.Shape robust text detection with progressive scale expansion network[J].arXiv：1806.02559，2018.
[26] VATTI B R.A generic solution to polygon clippisng[J].Communications of the ACM，1992，35（7）：56-63.
[27] GUPTA A，VEDALDI A，ZISSERMAN A.Synthetic data for text localisation in natural images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：2315-2324.
[28] TIAN Z，HUANG W，HE T，et al.Detecting text in natural image with connectionist text proposal network[C]//European Conference on Computer Vision.Cham：Springer，2016：56-72.
[29] SHI B，BAI X，BELONGIE S.Detecting oriented text in natural images by linking segments[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：2550-2558.
[30] LIU Y，JIN L，ZHANG S，et al.Detecting curve text in the wild：new dataset and new solution[J].arXiv：1712.02170，2017.
[31] LONG S，RUAN J，ZHANG W，et al.Textsnake：a flexible representation for detecting text of arbitrary shapes[C]//Proceedings of the European Conference on Computer Vision（ECCV），2018：20-36.
[32] XU Y，WANG Y，ZHOU W，et al.Textfield：Learning a deep direction field for irregular scene text detection[J].IEEE Transactions on Image Processing，2019，28（11）：5566-5579.
[33] BAEK Y，LEE B，HAN D，et al.Character region awareness for text detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：9365-9374.
[34] ZHANG C，LIANG B，HUANG Z，et al.Look more than once：an accurate detector for text of arbitrary shapes[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：10552-10561.
[35] DENG D，LIU H，LI X，et al.Pixellink：detecting scene text via instance segmentation[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2018.
[36] HE P，HUANG W，HE T，et al.Single shot text detector with regional attention[C]//Proceedings of the IEEE International Conference on Computer Vision，2017：3047-3055.
[37] HE W，ZHANG X Y，YIN F，et al.Deep direct regression for multi-oriented scene text detection[C]//Proceedings of the IEEE International Conference on Computer Vision，2017：745-753.
[38] LIU Z，LIN G，YANG S，et al.Learning markov clustering networks for scene text detection[J].arXiv：1805.08365，2018.