计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (7): 170-175.DOI: 10.3778/j.issn.1002-8331.1706-0291

• 模式识别与人工智能 • 上一篇    下一篇

基于字符图像分割的打印文件识别方法

陈庆虎,周小丹,鄢煜尘   

  1. 武汉大学 电子信息学院,武汉 430072
  • 出版日期:2018-04-01 发布日期:2018-04-16

Recognition of print file based on character image segmentation

CHEN Qinghu, ZHOU Xiaodan, YAN Yuchen   

  1. Electronic Information School, Wuhan University, Wuhan 430072, China
  • Online:2018-04-01 Published:2018-04-16

摘要: 针对目前的打印文件识别方法受限于样本中必须有相同字符的问题,提出一种基于字符图像分割的打印文件识别方法。通过k-means算法对字符图像进行分割,分别对不同区域提取局部二值模式纹理特征,从而消除字符结构对识别结果的影响。研究了单一区域的特征集和组合特征集的分类识别效果,实验结果表明,该方法在样本中无相同字符的情况下,能够得到较高的识别准确率。

关键词: 打印文件识别, 字符结构, 图像分割, k-means算法, 局部二值模式

Abstract: Aiming at the problem that the current image recognition method is limited to the same characters in the sample, this paper presents a method of recognizing the document based on character image segmentation. The k-means algorithm is used to segment the character images, and the local binary pattern texture features are extracted for different regions, thus eliminating the influence of the character structure on the recognition results. To study the recognition of single region feature set and combination feature set, the results show that the proposed method can obtain high recognition accuracy without same characters in the sample.

Key words: print file recognition, character structure, image segmentation, k-means algorithm, Local Binary Pattern(LBP)