计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (21): 112-122.DOI: 10.3778/j.issn.1002-8331.2206-0506

• 模式识别与人工智能 • 上一篇    下一篇

TextRail:复杂自然场景下的不规则文本检测算法

马静,薛浩,郭小宇   

  1. 南京航空航天大学 经济与管理学院,南京 211106
  • 出版日期:2023-11-01 发布日期:2023-11-01

TextRail:Irregular Text Detection Algorithm in Complicated Natural Scenarios

MA Jing, XUE Hao, GUO Xiaoyu   

  1. College of Economics and Management, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
  • Online:2023-11-01 Published:2023-11-01

摘要: 文本检测是文本识别的前提和基础。复杂自然场景下,受透视、遮挡、变形等因素影响,图像质量难以保证,同时图像中的文字形式丰富多样,多呈不规则形状,加上复杂背景的干扰,致使文本检测难度大、精确度低。针对文本形状不规则的场景,提出了一种文本边轨模型(TextRail),该模型基于文本上、下边界基准点表示文本区域的思想,实现对任意形状文本的高效检测。TextRail使用全卷积网络(full convolutional network,FCN)及特征金字塔网络(feature pyramid network,FPN)提取文本图像特征;将特征送入检测头网络,实现文本区域上下边界基准点的预测,将预测结果通过位置感知非极大抑制(locality-aware non-maximum suppression,LNMS)合并,得到最终的上下边界基准点;采用薄板样条插值(thin plate spline,TPS)的方法实现对不规则文本的自动矫正。通过大量的实验验证,TextRail在F1分值上优于其他文本检测模型。同时TextRail模型可以准确表示出文字的朝向、弯曲和变形情况,有效提升了不规则文本检测的准确率和鲁棒性。

关键词: 复杂自然场景, 不规则文本检测, 文本矫正, 基准点, TextRail模型

Abstract: Text detection is a prerequisite for text recognition. In complicated natural scenarios, texts may be distorted, bent or in irregular shapes, the image is in poor quality. The texts that existed in images have abundant style, irregular shapes, and complex backgrounds, which make detection incorrect and have low recognition accuracy. For irregular text, an upper and lower fiducial point is used as the basis to construct the TextRail model, aiming to detect texts with any shapes effectively. Firstly, full convolutional network(FCN) and feature pyramid network(FPN) are used to extract features from images. Secondly, these features are sent to detection heads in order to predict the upper and lower boundaries of the text area. Locality-aware non-maximum suppression(LNMS) network is used to obtain the final fiducial points prediction. Finally, thin plate spline(TPS) is utilized to rectify bent texts based on the fiducial points. The results of experiments show that the F1-score of the proposed model is the best among all the models. This method can represent text orientation, bend, and distortion. Therefore, the proposed method can significantly increase the recognition accuracy and robustness when texts are bent or in different orientations.

Key words: complicated natural scenarios, irregular text detection, text rectification, fiducial point, TextRail model