Improved hidden Markov models used in Kazakh part-of-speech tagging

doi:10.3778/j.issn.1002-8331.2010.36.040

Computer Engineering and Applications ›› 2010, Vol. 46 ›› Issue (36): 147-149.DOI: 10.3778/j.issn.1002-8331.2010.36.040

• 数据库、信号与信息处理 • Previous Articles Next Articles

Improved hidden Markov models used in Kazakh part-of-speech tagging

HOU Cheng-feng，Gulila·Altenbek

College of Information Science and Engineering，Xinjiang University，Urumqi 830046，China

Received:2010-07-21 Revised:2010-09-10 Online:2010-12-21 Published:2010-12-21
Contact: HOU Cheng-feng

改进的HMM应用于哈萨克语词性标注

侯呈风，古丽拉·阿东别克

新疆大学信息科学与工程学院，乌鲁木齐 830046

通讯作者: 侯呈风

Abstract

Abstract: Part-of-Speech（POS） tagging of Kazakh is playing a key role in natural language information processing.Kazakh POS tagging is the basis of syntactic analysis，information retrieval and machine translation.Based upon the traditional HMM，computing of HMM parameters，data-smoothing and process of?words which are not logged enable to improve context dependence relationship.Use statistical method to train Kazakh corpus，and then use Viterbi algorithm to implement POS tagging.The experimental results show that the effect of POS tagging of improved HMM is better than traditional HMM.

Key words: Hidden Markov Models（HMM）, Kazakh, part-of-speech tagging

摘要： 哈萨克语的词性标注在自然语言信息处理领域中扮演着重要角色，是句法分析、信息抽取、机器翻译等自然语言处理的基础。在传统的HMM的基础上改进了HMM模型参数的计算、数据平滑以及未登录词的处理方法，使之更好地体现词语的上下文依赖关系。利用基于统计的方法对哈萨克语熟语料进行训练，然后用Viterbi算法实现词性标注。实验结果表明利用改进的HMM进行词性标注的效果比传统的HMM好。

关键词: 隐马尔科夫模型, 哈萨克语, 词性标注

CLC Number:

TP391.1

HOU Cheng-feng，Gulila·Altenbek. Improved hidden Markov models used in Kazakh part-of-speech tagging[J]. Computer Engineering and Applications, 2010, 46(36): 147-149.

侯呈风，古丽拉·阿东别克. 改进的HMM应用于哈萨克语词性标注[J]. 计算机工程与应用, 2010, 46(36): 147-149.

[1]	XU Chun1，2，3, YANG Yong4, JIANG Tonghai1. Research on machine translation based Uyghur morphological analysis [J]. Computer Engineering and Applications, 2017, 53(14): 138-142.
[2]	Dawel Abilhayer, Nurmemet Yolwas, LIU Yan. On language model construction for LVCSR in Kazakh [J]. Computer Engineering and Applications, 2016, 52(24): 178-181.
[3]	JIANG Fang1，2, LI Guohe1，2，3, YUE Xiang4, WU Weijiang1，2，3, HONG Yunfeng3, LIU Zhiyuan3, CHENG Yuan3. Segmentation of Chinese word based on method of rough segment and part of speech tagging [J]. Computer Engineering and Applications, 2015, 51(6): 204-207.
[4]	GULIZADA·Haisa1, GULILA·Altenbek2，3. Research on automatic identification of base verb phrases in Kazakh [J]. Computer Engineering and Applications, 2015, 51(2): 218-223.
[5]	Sahdolla MUBARAK, Gulila ALTENBEK. Research on intelligent conversion between Kazakh both text forms [J]. Computer Engineering and Applications, 2014, 50(18): 226-229.
[6]	CHEN Li, Gulila·ALTENBEK. Research on Kirgiz language part of speech tagging based on HMM [J]. Computer Engineering and Applications, 2014, 50(15): 120-124.
[7]	XU Hui1, Riyiman TURSUN1，2, Wushour SILAMU2. Online-handwriting recognition research of Uyghur word using GMM and HMM [J]. Computer Engineering and Applications, 2014, 50(11): 202-205.
[8]	Dawe1 Abilhayer1，2, Gulila Altenbek1，2. Study of HMM based online Kazakh handwriting recognition [J]. Computer Engineering and Applications, 2014, 50(1): 145-148.
[9]	SANG Haiyan1，2, Gulia·Altenbek1，2, NIU Ningning1，2. Kazakh part-of-speech tagging method based on maximum entropy [J]. Computer Engineering and Applications, 2013, 49(11): 126-129.
[10]	WANG Yali, Gulila·Altenbek. Use improved words filed general usage extracting Kazakh common-used words [J]. Computer Engineering and Applications, 2012, 48(28): 168-173.
[11]	NIJAT Najmidin1，2, MAHMUD Mamat3, TURGUN Ibrahim4. Experimental study of N-gram based Uyghur part of speech tagging [J]. Computer Engineering and Applications, 2012, 48(25): 137-140.
[12]	ZHU Chengwen1, LI Bing2, HU Kui3. Algorithm of parameter estimation of HMM via Gibbs sampling [J]. Computer Engineering and Applications, 2012, 48(18): 57-60.
[13]	DONG Xinghua1, XU Chun2, WANG Lei1, ZHOU Xi1. Multilingual online machine translation research [J]. Computer Engineering and Applications, 2012, 48(15): 144-148.
[14]	ELI Jume1，Halidan.A1，HUANG Hao2. Recognition of extracting Uyghur texts from videos images [J]. Computer Engineering and Applications, 2011, 47(36): 190-192.
[15]	JIAO Binliang，CHEN Shuang. Face recognition based on PCA [J]. Computer Engineering and Applications, 2011, 47(18): 201-203.

Improved hidden Markov models used in Kazakh part-of-speech tagging

改进的HMM应用于哈萨克语词性标注

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics