哈萨克语词性自动标注研究初探

doi:10.3778/j.issn.1002-8331.2008.20.073

计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (20): 242-244.DOI: 10.3778/j.issn.1002-8331.2008.20.073

哈萨克语词性自动标注研究初探

刘艳,古丽拉·阿东别克,伊力亚尔

新疆大学信息科学与工程学院，乌鲁木齐 830046

收稿日期:2008-03-17 修回日期:2008-06-05 出版日期:2008-07-11 发布日期:2008-07-11
通讯作者: 刘艳

Preliminary study on Kazak Part-of-Speech automatic tagging

LIU Yan,GULILA.Altenbek,Yiliyaer

College of Information Science & Engineering，Xinjiang University，Urumqi 830046，China

Received:2008-03-17 Revised:2008-06-05 Online:2008-07-11 Published:2008-07-11
Contact: LIU Yan

摘要/Abstract

摘要： 词性标注在很多信息处理环节中都扮演着关键角色。哈萨克语作为新疆地区通用的少数民族语言之一，自然语言处理中的一些基础性的课题同样成为迫切需要解决的问题。分析了哈萨克语的构形语素特征，基于词典的一级标注基础上，采用统计方法，训练得到二元语法的HMM模型参数，运用Viterbi算法完成了基于统计方法的词性标注，最后运用哈语规则库对词性标注进行了修正。对单纯使用统计方法和以统计为主辅以规则修正的方法进行了比对测试，结果表明后者排岐正确率有所提高。

关键词: 哈萨克语词性标注, 构形语素, 二元语法, HMM

Abstract: Part-of-Speech tagging is playing a key role in many such information processing.Kazak，as one of the minority languages and characters being universally applied or used in Xinjiang，some basic problems in natural language treatment become the problems to be solved urgently.The thesis analyzes the configuration of Kazak morpheme characteristics.Based on the completement of one-level tagging of the dictionary，it adopts statistical methods，gaining model training parameter under the bi-gram HMM，and adopting the Viterbi algorithm to complete the Part-of-Speech tagging based on the statistical method.Finally adopting the Kazak language regular storehouse in revising parts of speech.The thesis finally compares and tests the methods of pure use of statistics and that of giving first place to statistical methods and assists the methods being amended with regulation.And final result indicates that the latter method enhances the correctness rate in arrangement.

Key words: Kazak Part-of-Speech tagging, configuration of morpheme, bi-gram, HMM

刘艳,古丽拉·阿东别克,伊力亚尔. 哈萨克语词性自动标注研究初探[J]. 计算机工程与应用, 2008, 44(20): 242-244.

LIU Yan,GULILA.Altenbek,Yiliyaer. Preliminary study on Kazak Part-of-Speech automatic tagging[J]. Computer Engineering and Applications, 2008, 44(20): 242-244.

[1]	王文涛，李树梅，汤婕，吕伟龙. 结合概率图模型与DNN的DDoS攻击检测方法[J]. 计算机工程与应用, 2021, 57(13): 108-115.
[2]	吴楚田，陈永乐，陈俊杰. 多协议交叉的HMM协议异常检测算法[J]. 计算机工程与应用, 2020, 56(8): 81-86.
[3]	郇战，李晨，万彩艳，陈学杰. 基于步行加速度信息分割的人员识别[J]. 计算机工程与应用, 2019, 55(1): 203-209.
[4]	马正华1，李雷2，乔玉涛2，戎海龙3，曹海婷2. 基于多传感器融合的动态手势识别研究分析[J]. 计算机工程与应用, 2017, 53(17): 153-159.
[5]	戈永侃，于凤芹. 后置滤波器参数自适应的语音合成改进算法[J]. 计算机工程与应用, 2017, 53(1): 168-171.
[6]	胡一帆，胡友彬，李骞，耿冬冬. 基于视频监控的人脸检测跟踪识别系统研究[J]. 计算机工程与应用, 2016, 52(21): 1-7.
[7]	李双庆，慕升弟. 一种改进的DBSCAN算法及其应用[J]. 计算机工程与应用, 2014, 50(8): 72-76.
[8]	陈莉，古丽拉·阿东别克. 基于HMM的柯尔克孜语词性标注的研究[J]. 计算机工程与应用, 2014, 50(15): 120-124.
[9]	许辉1，热依曼·吐尔逊1，2，吾守尔·斯拉木2. 基于HMM和GMM的维吾尔语联机手写体识别研究[J]. 计算机工程与应用, 2014, 50(11): 202-205.
[10]	阙大顺1，田犇1，赵永安2. 基于FPGA的关键词识别系统实现[J]. 计算机工程与应用, 2013, 49(8): 217-221.
[11]	陆志坚，吴艳霞，郭振华，孙延腾. 基于脉动阵列的HMMer加速系统[J]. 计算机工程与应用, 2013, 49(8): 76-80.
[12]	刘凤增，李国辉，唐敏. 语音分离与HMM相结合的语音增强方法[J]. 计算机工程与应用, 2013, 49(16): 196-200.
[13]	谷军霞，林润生，王省. AdaBoost-EHMM算法及其在行为识别中的应用[J]. 计算机工程与应用, 2013, 49(14): 186-192.
[14]	何欣荣，董建园，张刚. 基于扩展C型HMM视频人脸识别[J]. 计算机工程与应用, 2012, 48(24): 149-152.
[15]	程延伟，谢永成，李光升. 某种车辆电源系统故障诊断方法研究[J]. 计算机工程与应用, 2012, 48(1): 245-248.

哈萨克语词性自动标注研究初探

Preliminary study on Kazak Part-of-Speech automatic tagging

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics