计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (9): 133-138.DOI: 10.3778/j.issn.1002-8331.1612-0245

• 模式识别与人工智能 • 上一篇    下一篇

脱机手写维吾尔文本图像单词切分

阿依萨代提·阿卜力孜,加合买提·司马义,卡米力·木依丁,艾斯卡尔·艾木都拉   

  1. 新疆大学 信息科学与工程学院,乌鲁木齐 830046
  • 出版日期:2018-05-01 发布日期:2018-05-15

Word extraction from Uyghur handwritten documents

AYSADET·Abliz, HOJAHMAT·Ismayil, KAMIL·Muyidin, ASKAR·Hamdulla   

  1. Institute of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
  • Online:2018-05-01 Published:2018-05-15

摘要: 针对脱机手写维吾尔文本行图像中单词切分问题,提出了FCM融合K-means的聚类算法。通过该算法得到单词内距离和单词间距离两种分类。以聚类结果为依据,对文字区域进行合并,得到切分点,再对切分点内的文字进行连通域标注,进行着色处理。以50幅不同的人书写的维吾尔脱机手写文本图像为实验对象,共有536行和4?002个单词,正确切分率达到80.68%。实验结果表明,该方法解决了手写维吾尔文在切分过程中,单词间距离不规律带来的切分困难的问题和一些单词间重叠的问题。同时实现了大篇幅手写文本图像的整体处理。

关键词: 维吾尔文, 手写文本图像, 单词切分, 聚类, 着色处理

Abstract: For the problem of word extraction from handwritten Uyghur text lines, this paper proposes a clustering algorithm based on FCM fusion K-means. Through the clustering, two classification can be obtained for within word distance and between word distance. Based on clustering results, merging the connected components to get the segmented points. At the same time for the connected components which are within the segmented points used connected components labeling and coloring. In this paper, experimental object is 50 pairs of Uyghur off-line handwritten text images that are written different people and there are 536 lines and 4,002 words, correct segmentation rate reaches 80.68%. Experimental results show that the proposed method solve the problem which is difficult to extract words from the text line because of irregular distance between the words and overlapping between adjacent words. Meanwhile the presented method achieves whole dispose to the large handwritten text image.

Key words: Uyghur, handwritten text image, word extraction, clustering, coloring