Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (4): 52-63.DOI: 10.3778/j.issn.1002-8331.2106-0411

• Research Hotspots and Reviews • Previous Articles     Next Articles

Application of Scene Text Recognition Technology Based on Deep Learning:A Survey

LIU Yanju, YI Xinhai, LI Yange, ZHANG Huiyu, LIU Yanzhong   

  1. 1.School of Mathematics and Information Science, Nanjing Normal University of Special Education, Nanjing 210038, China
    2.School of Computer and Control Engineering, Qiqihar University, Qiqihar, Heilongjiang 161000, China
  • Online:2022-02-15 Published:2022-02-15

深度学习在场景文字识别技术中的应用综述

刘艳菊,伊鑫海,李炎阁,张惠玉,刘彦忠   

  1. 1.南京特殊教育师范学院 数学与信息科学学院,南京 210038
    2.齐齐哈尔大学 计算机与控制工程学院,黑龙江 齐齐哈尔 161000

Abstract: With the development of deep learning technology in the field of computer vision, there are breakthroughs in scene text detection and text recognition technology. Affected by extreme lighting, occlusion, blur, multi-direction and multi-scale in natural scenes, there are still huge challenges facing unconstrained scene text detection and recognition. In this paper, the scene text detection and text recognition technology are studied deeply from the perspective of deep learning, and the method and regression based on segmentation in the text detection technology are summarized. The combination of the advantages of the method can solve the problem of low recall rate of small text areas, while adapting to multi-scale text. Through the combination of the CTC mechanism and the Attention mechanism in the text recognition method, mutual supervision can be achieved, the recognition performance is improved, and the error rate of long text recognition is reduced.

Key words: deep learning, computer vision, natural scene, text detection, text recognition

摘要: 随着深度学习技术在计算机视觉领域的发展,场景文本检测与文字识别技术也有了突破性的进展。受到自然场景下极端光照、遮挡、模糊、多方向多尺度等情况的影响,无约束的场景文本检测与识别仍然面临着巨大的挑战。从深度学习的角度对场景文本检测和文字识别技术进行深入研究,总结出在文本检测技术中将基于分割的方法与回归的方法优势相结合,可以解决小文本区域的召回率较低的问题,同时适应多尺度文本;在文本识别方法中将CTC机制与Attention机制相结合,可以相互监督以提升识别性能,降低长文本识别的出错率。

关键词: 深度学习, 计算机视觉, 自然场景, 文本检测, 文字识别