Computer Engineering and Applications ›› 2018, Vol. 54 ›› Issue (9): 133-138.DOI: 10.3778/j.issn.1002-8331.1612-0245

Previous Articles     Next Articles

Word extraction from Uyghur handwritten documents

AYSADET·Abliz, HOJAHMAT·Ismayil, KAMIL·Muyidin, ASKAR·Hamdulla   

  1. Institute of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
  • Online:2018-05-01 Published:2018-05-15



  1. 新疆大学 信息科学与工程学院,乌鲁木齐 830046

Abstract: For the problem of word extraction from handwritten Uyghur text lines, this paper proposes a clustering algorithm based on FCM fusion K-means. Through the clustering, two classification can be obtained for within word distance and between word distance. Based on clustering results, merging the connected components to get the segmented points. At the same time for the connected components which are within the segmented points used connected components labeling and coloring. In this paper, experimental object is 50 pairs of Uyghur off-line handwritten text images that are written different people and there are 536 lines and 4,002 words, correct segmentation rate reaches 80.68%. Experimental results show that the proposed method solve the problem which is difficult to extract words from the text line because of irregular distance between the words and overlapping between adjacent words. Meanwhile the presented method achieves whole dispose to the large handwritten text image.

Key words: Uyghur, handwritten text image, word extraction, clustering, coloring

摘要: 针对脱机手写维吾尔文本行图像中单词切分问题,提出了FCM融合K-means的聚类算法。通过该算法得到单词内距离和单词间距离两种分类。以聚类结果为依据,对文字区域进行合并,得到切分点,再对切分点内的文字进行连通域标注,进行着色处理。以50幅不同的人书写的维吾尔脱机手写文本图像为实验对象,共有536行和4?002个单词,正确切分率达到80.68%。实验结果表明,该方法解决了手写维吾尔文在切分过程中,单词间距离不规律带来的切分困难的问题和一些单词间重叠的问题。同时实现了大篇幅手写文本图像的整体处理。

关键词: 维吾尔文, 手写文本图像, 单词切分, 聚类, 着色处理