Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (6): 159-167.DOI: 10.3778/j.issn.1002-8331.2008-0015

Previous Articles     Next Articles

Direct and Efficient Natural Scene Chinese Character Approaching Spotting Method

ZHAO Fan, ZHANG Lin, WEN Zhiquan, YANG Linlin, LIN Guangfeng   

  1. Department of Information Science, School of Printing, Packaging and Digital Media, Xi’an University of Technology, Xi’an 710048, China
  • Online:2021-03-15 Published:2021-03-12



  1. 西安理工大学 印刷包装与数字媒体学院 信息科学系,西安 710048


In order to improve the accuracy of the classic target detection algorithms for text localization in natural scenes, and to overcome the problem of incorrect segmentation of Chinese characters by traditional character detection models due to the non-connectivity between strokes, a direct and efficient Chinese text spotting method is proposed in this paper. Text box is detected by EAST algorithm. The detected text box is adjusted to make it more compact and contain text more comprehensively, which comprises the connected component extraction, Chinese character segmentation and text shape approximation. The extracted text regions are corrected and transcribed. Experimental results show that while maintaining 3.2 frame per second, the proposed algorithm has F-score of 83.5%, 72.8% and 81.1% in text positioning task of three multi-oriented text datasets, ICDAR2015, ICDAR2017-MLT and MSRA-TD500, respectively. The ablation experiment verifies the effectiveness of each module in the proposed algorithm. The performance of the comprehensive evaluation task of detection and recognition on the ICDAR2015 data set also proves that the proposed method has achieved better performance than some of the latest methods.

Key words: text detection, text spotting, text recognition, convolution neural network, multi-oriented text, spectral clustering


为了提高经典目标检测算法对自然场景文本定位的准确性,以及克服传统字符检测模型由于笔画间存在非连通性引起的汉字错误分割问题,提出了一种直接高效的自然场景汉字逼近定位方法。采用经典的EAST算法对场景图像中的文字进行检测。对初检的文字框进行调整使其更紧凑和更完整地包含文字,主要由提取各连通笔画成分、汉字分割和文字形状逼近三部分组成。矫正文字区域和识别文字内容。实验结果表明,提出的算法在保持平均帧率为3.1 帧/s的同时,对ICDAR2015、ICDAR2017-MLT和MSRA-TD500三个多方向数据集上文本定位任务中的F-score分别达到83.5%、72.8%和81.1%;消融实验验证了算法中各模块的有效性。在ICDAR2015数据集上的检测和识别综合评估任务中的性能也验证了该方法相比一些最新方法取得了更好的性能。

关键词: 文字检测, 文字定位, 文字识别, 卷积神经网络, 多方向文字, 谱聚类