改进的K-means算法在维文连体段聚类中的应用

计算机工程与应用 ›› 2014, Vol. 50 ›› Issue (14): 135-138.

• 数据库、数据挖掘、机器学习 • 上一篇下一篇

改进的K-means算法在维文连体段聚类中的应用

张建周，哈力木拉提·买买提，陈晓娇

新疆大学信息科学与工程学院多语种信息技术重点实验室，乌鲁木齐 830046

出版日期:2014-07-15 发布日期:2014-08-04

Application of improved K-means algorithm in Uyghur word-part clustering

ZHANG Jianzhou, Halmurat·Mamat, CHEN Xiaojiao

Key Lab of Multilanguage Information Technology, School of Information Science and Engineering, Xinjiang University, Urumqi 830046, China

Online:2014-07-15 Published:2014-08-04

摘要/Abstract

摘要： 在维吾尔文文字识别中，能否有效地聚类将直接影响识别结果的好坏。为改善聚类效果，针对维吾尔文连体段聚类，提出了一种改进的K-means聚类算法。该算法首先采用等间距法多次选择类中心，然后选择最佳码本和利用有效相似比来动态调整聚类个数K，最后完成了连体段聚类。实验结果表明：与传统K-means算法相比，改进的K-means算法得到了较好聚类效果，聚类正确率达90%以上。

关键词: 维吾尔文文字识别, 连体段, 聚类算法, 等间距法, 有效相似比, 正确率

Abstract: In Uyghur character recognition, the effect of the cluster will affect the recognition rate directly. To improve the clustering result, an improved K-means clustering algorithm based on Uyghur word-part is presented. The first step of the method is to select the center of the clustering by using the equal interval method repeatedly in order to select the best codebook, then adjust the number of clustering classes （noted as K） by using an effective similarity ratio dynamically. Finally, the word-part clustering is completed. The experimental results show that：compared with the traditional K-means algorithm, the improved K-means algorithm gets a better result and the clustering accuracy is more than 90%.

Key words: Uyghur character recognition, word-part, clustering algorithm, equal interval method, effective similarity ratio, accuracy

张建周，哈力木拉提·买买提，陈晓娇. 改进的K-means算法在维文连体段聚类中的应用[J]. 计算机工程与应用, 2014, 50(14): 135-138.

ZHANG Jianzhou, Halmurat·Mamat, CHEN Xiaojiao. Application of improved K-means algorithm in Uyghur word-part clustering[J]. Computer Engineering and Applications, 2014, 50(14): 135-138.

[1]	王俊玲，卢新明. 基于语义相关的视频关键帧提取算法[J]. 计算机工程与应用, 2021, 57(4): 192-198.
[2]	王芙银，张德生，张晓. 结合鲸鱼优化算法的自适应密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(3): 94-102.
[3]	张子然，黄卫华，陈阳，章政，李梓远. 基于双向搜索的改进蚁群路径规划算法[J]. 计算机工程与应用, 2021, 57(21): 270-277.
[4]	丁松阳，田青云. Ball-Tree优化的密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(20): 90-96.
[5]	翁玉尚，肖金球，夏禹. 改进Mask R-CNN算法的带钢表面缺陷检测[J]. 计算机工程与应用, 2021, 57(19): 235-242.
[6]	白璐，赵鑫，孔钰婷，张正航，邵金鑫，钱育蓉. 谱聚类算法研究综述[J]. 计算机工程与应用, 2021, 57(14): 15-26.
[7]	相益萱，姜合，潘品臣，孙聪慧. 二次幂耦合的[K]-means聚类算法研究[J]. 计算机工程与应用, 2021, 57(14): 95-102.
[8]	韩纪普，段先华，常振. 基于SLIC和区域生长的目标分割算法[J]. 计算机工程与应用, 2021, 57(1): 213-218.
[9]	李杰其，胡良兵. 基于机器学习的设备预测性维护方法综述[J]. 计算机工程与应用, 2020, 56(21): 11-19.
[10]	孙志冉，苏航，梁毅. 一种改进的K-Prototypes聚类算法[J]. 计算机工程与应用, 2020, 56(21): 54-59.
[11]	岳晓新，贾君霞，陈喜东，李广安. 改进YOLO V3的道路小目标检测[J]. 计算机工程与应用, 2020, 56(21): 218-223.
[12]	郭永坤，章新友，刘莉萍，丁亮，牛晓录. 优化初始聚类中心的K-means聚类算法[J]. 计算机工程与应用, 2020, 56(15): 172-178.
[13]	徐学斌，吾尔尼沙·买买提，阿力木江·艾沙，朱亚俐，库尔班·吾布力. 聚类+连体段判别的维吾尔文档图像单词切分[J]. 计算机工程与应用, 2020, 56(14): 148-155.
[14]	贾露，张德生，吕端端. 物理学优化的密度峰值聚类算法[J]. 计算机工程与应用, 2020, 56(13): 47-53.
[15]	杨俊闯，赵超. K-Means聚类算法研究综述[J]. 计算机工程与应用, 2019, 55(23): 7-14.

改进的K-means算法在维文连体段聚类中的应用

Application of improved K-means algorithm in Uyghur word-part clustering

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics