Computer Engineering and Applications ›› 2015, Vol. 51 ›› Issue (5): 178-182.

Previous Articles     Next Articles

Printed image layout segmentation method based on Chinese character connected component

FU Lujing1, QIAN Junhao1, ZHONG Yunfei2   

  1. 1.School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China
    2.School of Packaging and Materials Engineering, Hunan University of Technology, Zhuzhou, Hunan 412007, China
  • Online:2015-03-01 Published:2015-04-08

基于汉字连通分量的印刷图像版面分割方法

付芦静1,钱军浩1,钟云飞2   

  1. 1.江南大学 物联网工程学院,江苏 无锡 214122
    2.湖南工业大学 包装与材料工程学院,湖南 株洲 412007

Abstract: Contraposing the background color of the color printed image is plentiful and Chinese character has multiple connected components, text segmentation algorithm of connected domain can’t accurately extract text. A method of color printed image’s layout segmentation based on Chinese character connected component is proposed. Image is preprocessed via inverse halftoning algorithm of pyramid transforming. Then, it segments image color through color sampling and mean shift and marks text connected components. It reconstructs Chinese character connected component according to the structure of characters and connected components feature. Finally, the connection relations of characters connected components are analyzed to determine the orientation of text and realize text segmentation. The experimental results show that the method can effectively reconstruct character connected component and achieve text segmentation on color printed image for different font, font size and color.

Key words: text segmentation, connected component reconstruction, inverse halftoning, color sample, mean shift, clustering center

摘要: 针对彩色印刷图像背景色彩丰富和汉字存在多个连通分量,连通域文字分割算法不能精确提取文字,提出基于汉字连通分量的彩色印刷图像版面分割方法。利用金字塔变换逆半调算法对图像进行预处理,通过颜色采样和均值偏移分割图像颜色,标记文字连通分量,根据汉字结构和连通分量特性重建汉字连通分量,分析文字连通分量连接关系确定文字排列方向实现文字分割。实验结果表明,该方法能够有效地重建汉字连通分量,在彩色印刷图像中实现对不同字体、字号、颜色的文字分割。

关键词: 文字分割, 连通分量重建, 逆半调, 颜色采样, 均值偏移, 聚类中心