计算机工程与应用 ›› 2007, Vol. 43 ›› Issue (24): 246-248.

• 工程与应用 • 上一篇    

一种新的自然场景标志牌文本提取算法

张冬梅,张全元,郑 达,郑 蔚,李 晖,戴光明   

  1. 中国地质大学 计算机学院,武汉 430074
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-08-21 发布日期:2007-08-21
  • 通讯作者: 张冬梅

Novel algorithm for sign text extraction from natural scenes

ZHANG Dong-mei,ZHANG Quan-yuan,ZHENG Da,ZHENG Wei,LI Hui,DAI Guang-ming   

  1. Department of Computer Science,China University of Geosciences,Wuhan 430074,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-08-21 Published:2007-08-21
  • Contact: ZHANG Dong-mei

摘要: 从复杂的自然场景标志牌图像中提取和识别字符一直是数字图像处理领域的热点问题,目前的求解算法普遍存在提取文本精确度不高,提取率偏低,鲁棒性差等缺点。提出一种高效的文本提取算法,针对标志牌文本图像通常具有较复杂的自然背景等特征,首先对原始图片进行模糊化处理,然后进行Laplacian边缘提取,再对边缘图像进行非文本长边缘的删除,最后根据文本区域的特征进行边缘扫描和连通域分析实现标志牌文本的提取。通过对2003年国际自然场景文本识别竞赛(ICDAR’2003 Robust Reading Competition)中大量图片测试表明,该算法对背景的复杂度、文字语言、颜色、大小字体以及排列方向具有较强的鲁棒性,同时也具有较高的准确率(Precision)和提取率(Recall)。

关键词: 自然场景, 文本提取, 连通域分析, 边缘扫描

Abstract: Sign text extraction and character recognition from natural scenes is always a hot area in the field of digital photograph.The current algorithms have some shortcomings to this problem,such as low precision,low recall,and poor robustness.In this paper,we present a new highly efficient text extraction algorithm from complex images of natural scenes.Firstly,as sign text images usually have complex natural background,the original image is vaguely processed.Then through Laplacian marginal extraction and deletion of the long brink of non-text image,the text is extracted according to connected component analysis and edge scanning.From a large number of photographs testing in ICDAR’2003 Robust Reading Competition,this algorithm shows its robustness,accuracy and efficiency in identifying text language,color,font size and configuration from complex background.

Key words: natural scene, text extraction, connected component analysis, edge scan