计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (20): 242-244.DOI: 10.3778/j.issn.1002-8331.2008.20.073

• 工程与应用 • 上一篇    下一篇

哈萨克语词性自动标注研究初探

刘 艳,古丽拉·阿东别克,伊力亚尔   

  1. 新疆大学 信息科学与工程学院,乌鲁木齐 830046
  • 收稿日期:2008-03-17 修回日期:2008-06-05 出版日期:2008-07-11 发布日期:2008-07-11
  • 通讯作者: 刘 艳

Preliminary study on Kazak Part-of-Speech automatic tagging

LIU Yan,GULILA.Altenbek,Yiliyaer   

  1. College of Information Science & Engineering,Xinjiang University,Urumqi 830046,China
  • Received:2008-03-17 Revised:2008-06-05 Online:2008-07-11 Published:2008-07-11
  • Contact: LIU Yan

摘要: 词性标注在很多信息处理环节中都扮演着关键角色。哈萨克语作为新疆地区通用的少数民族语言之一,自然语言处理中的一些基础性的课题同样成为迫切需要解决的问题。分析了哈萨克语的构形语素特征,基于词典的一级标注基础上,采用统计方法,训练得到二元语法的HMM模型参数,运用Viterbi算法完成了基于统计方法的词性标注,最后运用哈语规则库对词性标注进行了修正。对单纯使用统计方法和以统计为主辅以规则修正的方法进行了比对测试,结果表明后者排岐正确率有所提高。

关键词: 哈萨克语词性标注, 构形语素, 二元语法, HMM

Abstract: Part-of-Speech tagging is playing a key role in many such information processing.Kazak,as one of the minority languages and characters being universally applied or used in Xinjiang,some basic problems in natural language treatment become the problems to be solved urgently.The thesis analyzes the configuration of Kazak morpheme characteristics.Based on the completement of one-level tagging of the dictionary,it adopts statistical methods,gaining model training parameter under the bi-gram HMM,and adopting the Viterbi algorithm to complete the Part-of-Speech tagging based on the statistical method.Finally adopting the Kazak language regular storehouse in revising parts of speech.The thesis finally compares and tests the methods of pure use of statistics and that of giving first place to statistical methods and assists the methods being amended with regulation.And final result indicates that the latter method enhances the correctness rate in arrangement.

Key words: Kazak Part-of-Speech tagging, configuration of morpheme, bi-gram, HMM