Research on segmentation of historical Chinese books

Computer Engineering and Applications ›› 2013, Vol. 49 ›› Issue (2): 29-33.

Previous Articles Next Articles

Research on segmentation of historical Chinese books

NI Enzhi1, JIANG Minjun2, ZHOU Changle1

1.Mind, Art and Computation Lab, School of Information Science and Technology, Xiamen University, Xiamen, Fujian 361005, China
2.School of Computer Science and Information Engineering, Shanghai Institute of Technology, Shanghai 201418, China

Online:2013-01-15 Published:2013-01-16

古代汉字文献切分研究

倪恩志1，蒋旻隽2，周昌乐1

1.厦门大学信息科学与技术学院，艺术认知与计算实验室，福建厦门 361005
2.上海应用技术学院计算机科学与信息工程学院，上海 201418

Abstract

Abstract: In this paper, the methods of text line segmentation and character segmentation are proposed according to the characteristics of historical Chinese documents. The method of line segmentation analyzes stroke projection, and adopts a recursive segmentation algorithm based on various project thresholds and gap thresholds. This algorithm is robust in the cases of text line adhesion and skew, especially short text lines. The method of character segmentation has two steps. A rough segmentation is applied to get the approximate positions of segmentation. A fine segmentation based on the analysis of connected components and the judgment of adhesion points is carried out. This algorithm can extract the characters even though they overlap and connect each other. The experimental results show the methods have good performance and are suitable for the segmentation of historical Chinese documents.

Key words: document image processing, Chinese character segmentation, ancient books digitalization

摘要： 针对古代汉字文档的特点，提出了适合于古文档的列切分方法和字切分方法。提出的列切分方法直接对文档的笔画投影进行分析，采用一种基于分层投影过滤和变长间隙阈值的递归切分算法。该算法在列间隔较小、列与格线存在粘连、文档具有一定程度的倾斜的情况下，也能准确地抽取出列，尤其对短列的切分达到了较好的效果。提出的字切分方法分为两步，进行粗切分确定大致的切分位置，采用基于连通域分析与粘连点判断的方法做进一步的细切分。该算法对具有较多粘连和重叠汉字的列，也能较好地切分出完整的单字。实验结果表明，提出的方法用于古代汉字文档切分能够获得较好的效果。

关键词: 文档图像处理, 文档切分, 古籍数字化

NI Enzhi1, JIANG Minjun2, ZHOU Changle1. Research on segmentation of historical Chinese books[J]. Computer Engineering and Applications, 2013, 49(2): 29-33.

倪恩志1，蒋旻隽2，周昌乐1. 古代汉字文献切分研究[J]. 计算机工程与应用, 2013, 49(2): 29-33.

[1]	LIU Xingchen, JIN Xiaofeng. Characters Segmentation Method of Historical Documents Mixed in Korean and Chinese [J]. Computer Engineering and Applications, 2020, 56(11): 135-141.
[2]	ZHAO Ya-qin. Novel effective method of topic caption text in news video [J]. Computer Engineering and Applications, 2009, 45(33): 175-178.
[3]	MA Yang-tao,TAO Zhi-sui,ZHANG Jin-huan,YANG Xiao-wei. Optimal model for handwritten Chinese character segmention [J]. Computer Engineering and Applications, 2008, 44(2): 227-229.

Research on segmentation of historical Chinese books

古代汉字文献切分研究

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 3

Recommended Articles

Metrics