脱机手写维吾尔文本图像单词切分

doi:10.3778/j.issn.1002-8331.1612-0245

计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (9): 133-138.DOI: 10.3778/j.issn.1002-8331.1612-0245

脱机手写维吾尔文本图像单词切分

阿依萨代提·阿卜力孜，加合买提·司马义，卡米力·木依丁，艾斯卡尔·艾木都拉

新疆大学信息科学与工程学院，乌鲁木齐 830046

出版日期:2018-05-01 发布日期:2018-05-15

Word extraction from Uyghur handwritten documents

AYSADET·Abliz, HOJAHMAT·Ismayil, KAMIL·Muyidin, ASKAR·Hamdulla

Institute of Information Science and Engineering, Xinjiang University, Urumqi 830046, China

Online:2018-05-01 Published:2018-05-15

摘要/Abstract

摘要： 针对脱机手写维吾尔文本行图像中单词切分问题，提出了FCM融合K-means的聚类算法。通过该算法得到单词内距离和单词间距离两种分类。以聚类结果为依据，对文字区域进行合并，得到切分点，再对切分点内的文字进行连通域标注，进行着色处理。以50幅不同的人书写的维吾尔脱机手写文本图像为实验对象，共有536行和4?002个单词，正确切分率达到80.68%。实验结果表明，该方法解决了手写维吾尔文在切分过程中，单词间距离不规律带来的切分困难的问题和一些单词间重叠的问题。同时实现了大篇幅手写文本图像的整体处理。

关键词: 维吾尔文, 手写文本图像, 单词切分, 聚类, 着色处理

Abstract: For the problem of word extraction from handwritten Uyghur text lines, this paper proposes a clustering algorithm based on FCM fusion K-means. Through the clustering, two classification can be obtained for within word distance and between word distance. Based on clustering results, merging the connected components to get the segmented points. At the same time for the connected components which are within the segmented points used connected components labeling and coloring. In this paper, experimental object is 50 pairs of Uyghur off-line handwritten text images that are written different people and there are 536 lines and 4,002 words, correct segmentation rate reaches 80.68%. Experimental results show that the proposed method solve the problem which is difficult to extract words from the text line because of irregular distance between the words and overlapping between adjacent words. Meanwhile the presented method achieves whole dispose to the large handwritten text image.

Key words: Uyghur, handwritten text image, word extraction, clustering, coloring

阿依萨代提·阿卜力孜，加合买提·司马义，卡米力·木依丁，艾斯卡尔·艾木都拉. 脱机手写维吾尔文本图像单词切分[J]. 计算机工程与应用, 2018, 54(9): 133-138.

AYSADET·Abliz, HOJAHMAT·Ismayil, KAMIL·Muyidin, ASKAR·Hamdulla. Word extraction from Uyghur handwritten documents[J]. Computer Engineering and Applications, 2018, 54(9): 133-138.

[1]	兰红，黄敏. 融合KNN优化的密度峰值和FCM聚类算法[J]. 计算机工程与应用, 2021, 57(9): 81-88.
[2]	郭晓静，隋昊达. 改进YOLOv3在机场跑道异物目标检测中的应用[J]. 计算机工程与应用, 2021, 57(8): 249-255.
[3]	李莉，纪欣沅，宋嵩. 回环软件缺陷数量预测模型[J]. 计算机工程与应用, 2021, 57(7): 158-163.
[4]	霍光煜，张勇，孙艳丰，尹宝才. 基于语义的档案数据智能分类方法研究[J]. 计算机工程与应用, 2021, 57(6): 247-253.
[5]	杨芳，尹曦，司建辉，刘宏媛，汪雪. 基于侧重点聚类的数学表达式相似度计算方法[J]. 计算机工程与应用, 2021, 57(6): 88-93.
[6]	赵凡，张琳，闻治泉，杨林林，蔺广逢. 一种直接高效的自然场景汉字逼近定位方法[J]. 计算机工程与应用, 2021, 57(6): 159-167.
[7]	彭启慧，宣士斌，高卿. 分布的自动阈值密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(5): 71-78.
[8]	李勇振，廖湖声. 基于图卷积神经网络的多视角聚类[J]. 计算机工程与应用, 2021, 57(5): 115-122.
[9]	王昌龙，张远东，缪宏，杨煜恒. 双通道卷积神经网络在南瓜病害识别上的应用[J]. 计算机工程与应用, 2021, 57(5): 183-189.
[10]	胡晓敏，王明丰，张首荣，李敏. 用于文本聚类的新型差分进化粒子群算法[J]. 计算机工程与应用, 2021, 57(4): 61-67.
[11]	王俊玲，卢新明. 基于语义相关的视频关键帧提取算法[J]. 计算机工程与应用, 2021, 57(4): 192-198.
[12]	王芙银，张德生，张晓. 结合鲸鱼优化算法的自适应密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(3): 94-102.
[13]	陈俊丰，郑中团. WKMeans与SMOTE结合的不平衡数据过采样方法[J]. 计算机工程与应用, 2021, 57(23): 106-112.
[14]	张忠林，赵昱，闫光辉. 自然邻居密度极值聚类算法[J]. 计算机工程与应用, 2021, 57(23): 200-210.
[15]	梅婕，魏圆圆，许桃胜. 基于密度峰值多起始中心的融合聚类算法[J]. 计算机工程与应用, 2021, 57(22): 78-85.

脱机手写维吾尔文本图像单词切分

Word extraction from Uyghur handwritten documents

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics