计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (3): 242-248.DOI: 10.3778/j.issn.1002-8331.2009-0076

• 图形图像处理 • 上一篇    下一篇

基于双塔结构的场景文字检测模型

施漪涵,仝明磊,张魁,姚宏扬   

  1. 上海电力大学 电子与信息工程学院,上海 200090
  • 出版日期:2022-02-01 发布日期:2022-01-28

Scene Text Detection Model Based on Double Tower Structure

SHI Yihan, TONG Minglei, ZHANG Kui, YAO Hongyang   

  1. School of Electronics and Information Engineering, Shanghai University of Electric Power, Shanghai 200090, China
  • Online:2022-02-01 Published:2022-01-28

摘要: 当图像中文字区域形状复杂多变时,传统锚点方法难以精确定位文字,针对这一问题,提出一种具有双塔结构的文字分割检测算法。在网络中增加自下而上的特征增强路径以充分提炼语义信息,与上一级自上而下的结构形成双金字塔模型;接着新增一条路径缩短较底层与最顶层特征之间的距离,同时使用膨胀卷积,增大卷积核的感受野;在损失函数的设计中引入[γ]参数,改变图像中正负样本的权重分配,使网络更关注困难样本。在标准数据集ICDAR2015和ICDAR2017上进行评估,实验结果表明提出的双塔结构模型能有效提高网络对文字区域的检测准确度。

关键词: 卷积神经网络, 特征融合, 文字检测, 图像分割

Abstract: It is difficult for the traditional anchor method to accurately locate the text area since its shape is complex and variant severely. To tackle this problem, The text segmentation algorithm via a novel double-tower structure is proposed. This paper introduces a bottom-up path to enhance the feature map and fully refines the semantic information, therefore, a double-tower structure is formulated. Then a new route is presented to directly connect the lower and top feature layers, meanwhile, dilated convolution is utilized to increase the receptive field of the convolution kernel. Finally, the [γ] parameter is introduced in the loss function to change the weight of positive and negative samples, so that the network will focus more on difficult samples. Evaluated on the benchmark data sets ICDAR2015 and ICDAR2017, the experimental results show that the double-tower structure proposed in this paper can effectively improve the accuracy of the text area detection.

Key words: convolutional neural network, feature fusion, text detection, image segmentation