Kazakh part-of-speech tagging method based on maximum entropy

Computer Engineering and Applications ›› 2013, Vol. 49 ›› Issue (11): 126-129.

Previous Articles Next Articles

Kazakh part-of-speech tagging method based on maximum entropy

SANG Haiyan1，2, Gulia·Altenbek1，2, NIU Ningning1，2

1.College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
2.The Base of Kazakh and Kirghiz Language, Minority Languages Branch, National Language Resource Monitoring and Research Center, Urumqi 830046, China

Online:2013-06-01 Published:2013-06-14

基于最大熵的哈萨克语词性标注模型

桑海岩1，2，古丽拉·阿东别克1，2，牛宁宁1，2

1.新疆大学信息科学与工程学院，乌鲁木齐 830046
2.国家语言资源监测与研究中心少数民族语言中心哈萨克和柯尔克孜语文基地，乌鲁木齐 830046

Abstract

Abstract: Maximum entropy model can make full use of context, agilely take multiple characteristics. This paper uses maximum entropy model to part of speech tagging of Kazakh, designs feature template according to tackiness and rich shape, and joins the backward relying part of speech feature template. In this paper, the module is improved, which takes the previous n words of highest probability to join the characteristic vector of next word and so on until the end of the sentence, and finally it selects a probability optimal sequence of part of speech tagging. The results show that feature template choice is correct, and the improved model accuracy rate reaches 96.8%.

Key words: natural language processing, part-of-speech tagging, maximum entropy model, Kazakh

摘要： 最大熵模型能够充分利用上下文，灵活取用多个特征。使用最大熵模型进行哈萨克语的词性标注，根据哈语的粘着性、形态丰富等特点设计特征模板，并加入了向后依赖词性的特征模板。对模型进行了改进，在解码中取概率最大的前n个词性分别加入下一个词的特征向量中，以此类推直至句子结束，最终选出一条概率最优的词性标注序列。实验结果表明，特征模板的选择是正确的，改进模型的准确率达到了96.8%。

关键词: 自然语言处理, 词性标注, 最大熵模型, 哈萨克语

SANG Haiyan1，2, Gulia·Altenbek1，2, NIU Ningning1，2. Kazakh part-of-speech tagging method based on maximum entropy[J]. Computer Engineering and Applications, 2013, 49(11): 126-129.

桑海岩1，2，古丽拉·阿东别克1，2，牛宁宁1，2. 基于最大熵的哈萨克语词性标注模型[J]. 计算机工程与应用, 2013, 49(11): 126-129.

[1]	LIU Bowen, FAN Chunxiao. Relation Extraction Based on CapsuleNet via Position Perception [J]. Computer Engineering and Applications, 2021, 57(6): 101-107.
[2]	LIAO Wenxiong, ZENG Bi, XU Yayun. Natural Language Processing Model Based on One-Dimensional Dilated Convolution and Attention Mechanism [J]. Computer Engineering and Applications, 2021, 57(4): 114-119.
[3]	JIANG Yangyang, JIN Bo, ZHANG Baochang. Research Progress of Natural Language Processing Based on Deep Learning [J]. Computer Engineering and Applications, 2021, 57(22): 1-14.
[4]	YUAN Xun, LIU Rong, LIU Ming. Aspect-Level Sentiment Analysis Model Incorporating Multi-layer Attention [J]. Computer Engineering and Applications, 2021, 57(22): 147-152.
[5]	YANG Quan. SVM Algorithm for N1+N2 Structure Syntax Relation Determination [J]. Computer Engineering and Applications, 2021, 57(20): 104-108.
[6]	JIAO Kainan, LI Xin, ZHU Rongchen. Overview of Chinese Domain Named Entity Recognition [J]. Computer Engineering and Applications, 2021, 57(16): 1-15.
[7]	LIU Chang, Abudukelimu·Abulizi, YAO Dengfeng, Halidanmu·Abudukelimu. Survey for Uyghur Morphological Analysis [J]. Computer Engineering and Applications, 2021, 57(15): 42-61.
[8]	LI Zhi, WANG Zhen, YANG Fugeng, Xi Xuefeng. Research and Prospect of Automatic Question Answer Based on Table [J]. Computer Engineering and Applications, 2021, 57(13): 67-76.
[9]	BAO Yue, LI Yanling, LIN Min. Review of Extractive Machine Reading Comprehension [J]. Computer Engineering and Applications, 2021, 57(12): 25-36.
[10]	HE Yujie, DU Fang, SHI Yingjie, SONG Lijuan. Survey of Named Entity Recognition Based on Deep Learning [J]. Computer Engineering and Applications, 2021, 57(11): 21-36.
[11]	HAO Chao, QIU Hangping, SUN Yi, ZHANG Chaoran. Research Progress of Multi-label Text Classification [J]. Computer Engineering and Applications, 2021, 57(10): 48-56.
[12]	SUN Linghao. Cross-Lingual Chinese Named Entity Recognition Based on Translation Model [J]. Computer Engineering and Applications, 2021, 57(10): 94-100.
[13]	YU Tongrui, JIN Ran, HAN Xiaozhen, LI Jiahui, YU Ting. Review of Pre-training Models for Natural Language Processing [J]. Computer Engineering and Applications, 2020, 56(23): 12-22.
[14]	WU Cheng, WANG Chaokun, WANG Muxian. Entity Attributes Extraction Based on Text Simplification [J]. Computer Engineering and Applications, 2020, 56(21): 115-122.
[15]	TU Wenbo, YUAN Zhenming, YU Kai. Convolutional Neural Networks Without Pooling Layer for Chinese Word Segmentation [J]. Computer Engineering and Applications, 2020, 56(2): 120-126.

Kazakh part-of-speech tagging method based on maximum entropy

基于最大熵的哈萨克语词性标注模型

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics