Scene Text Detection Model Based on Double Tower Structure

doi:10.3778/j.issn.1002-8331.2009-0076

Abstract

Abstract: It is difficult for the traditional anchor method to accurately locate the text area since its shape is complex and variant severely. To tackle this problem, The text segmentation algorithm via a novel double-tower structure is proposed. This paper introduces a bottom-up path to enhance the feature map and fully refines the semantic information, therefore, a double-tower structure is formulated. Then a new route is presented to directly connect the lower and top feature layers, meanwhile, dilated convolution is utilized to increase the receptive field of the convolution kernel. Finally, the [γ] parameter is introduced in the loss function to change the weight of positive and negative samples, so that the network will focus more on difficult samples. Evaluated on the benchmark data sets ICDAR2015 and ICDAR2017, the experimental results show that the double-tower structure proposed in this paper can effectively improve the accuracy of the text area detection.

Key words: convolutional neural network, feature fusion, text detection, image segmentation

摘要： 当图像中文字区域形状复杂多变时，传统锚点方法难以精确定位文字，针对这一问题，提出一种具有双塔结构的文字分割检测算法。在网络中增加自下而上的特征增强路径以充分提炼语义信息，与上一级自上而下的结构形成双金字塔模型；接着新增一条路径缩短较底层与最顶层特征之间的距离，同时使用膨胀卷积，增大卷积核的感受野；在损失函数的设计中引入[γ]参数，改变图像中正负样本的权重分配，使网络更关注困难样本。在标准数据集ICDAR2015和ICDAR2017上进行评估，实验结果表明提出的双塔结构模型能有效提高网络对文字区域的检测准确度。

关键词: 卷积神经网络, 特征融合, 文字检测, 图像分割

SHI Yihan, TONG Minglei, ZHANG Kui, YAO Hongyang. Scene Text Detection Model Based on Double Tower Structure[J]. Computer Engineering and Applications, 2022, 58(3): 242-248.

施漪涵, 仝明磊, 张魁, 姚宏扬. 基于双塔结构的场景文字检测模型[J]. 计算机工程与应用, 2022, 58(3): 242-248.

References

[1] 张正夫.基于深度学习的场景文字检测与识别方法研究[D].深圳：中国科学院大学（中国科学院深圳先进技术研究院），2020.
ZHANG Z F.Deep learning based methods research on scene text detection and recognition[D].Shenzhen：University of Chinese Academy of Sciences（Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences），2020.
[2] 罗时婷，顾磊.基于深度神经网络损失函数融合的文本检测[J].计算机工程与应用，2020，56（16）：90-96.
LUO S T，GU L.Text detection based on depth neural network loss function fusion[J].Computer Engineering and Applications，2020，56（16）：90-96.
[3] BU?TA M，NEUMANN L，MATAS J，et al.Deep textspotter：an end-to-end trainable scene text localization and recognition framework[C]//Proceedings of International Conference on Computer Vision，2017：2223-2231.
[4] EPSHTEIN B，OFEK E，WEXLER Y.Detecting text in natural scenes with stroke width transform[C]//Computer Vision & Pattern Recognition，2010.
[5] DONOSER M，BISCHOF H.Efficient maximally stable extremal region（MSER） tracking[C]//IEEE Computer Society Conference on Computer Vision & Pattern Recognition，2006：553-560.
[6] 杨宏志，庞宇，王慧倩.基于改进Faster R-CNN的自然场景文字检测算法[J].重庆邮电大学学报（自然科学版），2019，31（6）：876-884.
YANG H Z，PANG Y，WANG H Q.Natural scene text detection algorithm based on improved faster R-CNN[J].Journal of Chongqing University of Posts and Telecommunications（Natural Science Edition），2019，31（6）：876-884.
[7] 杨剑锋，王润民，何璇，等.基于FCN的多方向自然场景文字检测方法[J].计算机工程与应用，2020，56（2）：164-170.
YANG J F，WANG R M，HE X，et al.Multi-oriented natural scene text detection algorithm based on FCN[J].Computer Engineering and Applications，2020，56（2）：164-170.
[8] ZHANG N，DONAHUE J，GIRSHICK R，et al.Part-based R-CNNs for fine-grained category detection[C]//European Conference on Computer Vision.Cham：Springer，2014.
[9] REDMON J，DIVVALA S，GIRSHICK R，et al.You only look once：unified，real-time object detection[C]//2016 IEEEE Conference on Computer Vision and Pattern Recognition（CVPR），2016：779-788.
[10] LIU W，ANGUELOV D，ERHAN D，et al.SSD：single shot MultiBox detector[C]//European Conference on Computer Vision.Cham：Springer，2016：21-37.
[11] 梁新宇，罗晨，权冀川，等.基于深度学习的图像语义分割技术研究进展[J].计算机工程与应用，2020，56（2）：18-28.
LIANG X Y，LUO C，QUAN J C，et al.Research on progress of image semantic segmentation based on deep learning[J].Computer Engineering and Applications，2020，56（2）：18-28.
[12] 张曼，夏战国，刘兵，等.全卷积神经网络的字符级文本分类方法[J].计算机工程与应用，2020，56（5）：166-172.
ZHANG M，XIA Z G，LIU B，et al.Character level text classification based on fully convolutional neural network[J].Computer Engineering and Applications，2020，56（2）：166-172.
[13] HE T，HUANG W L，QIAO Y，et al.Accurate text localization in natural image with cascaded convolutional text network[J].arXiv：1603.09423，2016.
[14] DENG D，LIU H F，LI X L，et al.PixelLink：detecting scene text via instance segmentation[J].arXiv：1801. 01315，2018.
[15] YANG Q，CHENG M，ZHOU W，et al.Inceptext：a new inception-text module with deformable PSROI pooling for multi-oriented scene text detection[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence（IJCAI），2018：1071-1077.
[16] LIN T Y，DOLLAR P，GIRSHICK R，et al.Feature pyramid networks for object detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR），2017.
[17] SIMONYAN K，ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv：1409.1556，2014.
[18] LIU S，QI L，QIN H，et al.Path aggregation network for instance segmentation[J].arXiv：1803.01534，2018.
[19] LIN T Y，GOYAL P，GIRSHICK R，et al.Focal loss for dense object detection[J].IEEE Transactions on Pattern Analysis & Machine Intelligence，2017（99）：2999-3007.
[20] SHRIVASTAVA A，GUPTA A，GIRSHICK R.Training region-based object detectors with online hard example mining[C]//IEEE Conference on Computer Vision & Pattern Recognition，2016：761-769.
[21] SHARMA N，MANDAL R，SHARMA R，et al.ICDAR2015 competition on video script identification（CVSI 2015）[C]//International Conference on Document Analysis & Recognition，2015.
[22] PHAM V K，LEE G S.Robust text detection in natural scene images[C]//Australasian Joint Conference on Artificial Intelligence.Cham：Springer，2016.
[23] TIAN Z，HUANG W，HE T，et al.Detecting text in natural image with connectionist text proposal network[C]//IEEE European Conference on Computer Vision，Amsterdam，Oct 8-Oct 16，2016：56-72.
[24] SHI B，BAI X，BELONGIE S.Detecting oriented text in natural images by linking segments[C]//IEEE Conference on Computer Vision and Pattern Recognition （CVPR），2017：2550-2558.
[25] ZHOU X，YAO C，WEN H，et al.EAST：an efficient and accurate acene text detector[J].arXiv：1704.03155，2017.
[26] WANG W，XIE E，LI X，et al.Shape robust text detection with progressive scale expansion network[C]//IEEE Conference on Computer Vision and Pattern Recognition，2019：9336-9345.
[27] BAEK Y，LEE B，HAN D，et al.Character region awareness for text detection[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR），2020.