计算机工程与应用 ›› 2017, Vol. 53 ›› Issue (4): 200-204.DOI: 10.3778/j.issn.1002-8331.1506-0160

• 图形图像处理 • 上一篇    下一篇

基于Adaboost算法的场景中文文本定位

尹  芳1,2,郑  亮1,陈田田1   

  1. 1.哈尔滨理工大学 计算机科学与技术学院,哈尔滨 150080
    2.哈尔滨理工大学 仪器科学与技术博士后科研流动站,哈尔滨 150080
  • 出版日期:2017-02-15 发布日期:2017-05-11

Chinese text localization based on Adaboost algorithm in natural images

YIN Fang1,2, ZHENG Liang1, CHEN Tiantian1   

  1. 1.School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080, China
    2.Instrument Science and Technology Postdoctoral Research Station, Harbin University of Science and Technology, Harbin 150080, China
  • Online:2017-02-15 Published:2017-05-11

摘要: 提出了一种基于Adaboost算法的场景中文文本定位的新方法。首先利用边缘特征进行文本区域的检测,即对数字图像进行边缘提取、二值化处理,然后通过连通域分析去除明显的非字符连通域,并获得候选的文本区域。对场景中文文本区域进行分析,提取了场景中文文本的4类特征,并利用这4类特征经过分类与回归决策树构造了Adaboost强分类器。将候选文本区域送入强分类器,得到正确的文本区域。实验结果表明方法不仅对场景文本图像中字体、大小和颜色多变的文本具有很好的定位效果,而且具有很高的召回率和准确率。

关键词: 文本定位, 文本识别, 连通域, 分类与回归决策树

Abstract:  A novel Chinese text localization method based on Adaboost in natural images is proposed in this paper. Firstly, the text regions are detected using the edge feature, where digital image is processed by edge extraction and binarization, then connected domain analysis is used to remove non-text regions and get candidate text regions. Secondly, a strong classifier of Adaboost with CART(Classification And Regression Tree) is constructed by using the four classes Chinese text features that are extracted by analyzing the text areas. Finally, the correct text areas are got after the candidate regions are send into the strong classifier. The experimental results show that not only can this method achieve a good effect on the text location in the natural images including the images with text of various fonts, sizes and colors but also realize high recall rate and precision rate.

Key words: text location, text recognition, connected domain, classification and regression tree