Computer Engineering and Applications ›› 2010, Vol. 46 ›› Issue (31): 135-138.DOI: 10.3778/j.issn.1002-8331.2010.31.038
• 数据库、信号与信息处理 • Previous Articles Next Articles
FENG Jing-hua,Gulila·Altenbek,Mayra·Hapar
Received:
Revised:
Online:
Published:
Contact:
冯鲸华,古丽拉·阿东别克,玛依来·哈帕尔
通讯作者:
Abstract: Aiming at the characters of Kazakh organization name’ composition in Kazakh text,an effective method based on N-gram model for computing Kazakh organization name’ confidence is proposed.Using the tail words of Kazakh organization name as the burst words,this paper constructs a recognition system for Kazakh organization name.The system consists of a training module and a recognizing module.The recognition process is as follows:At first,features are extracted from the training corpus,and they are trained.A model is established,which has been trained by some features.Then,this model and some simple rule-bases are used to recognize Kazakh organization name in the testing corpus.The experimental results show that this method is feasible.
Key words: N-gram model, recognition of Kazakh organization name, name entity recognition
摘要: 针对哈萨克文文本中机构名构成特点,提出了一种基于N-gram语言模型的哈萨克文机构名可信度计算方法,并以机构名尾词为触发词,构建了一个哈萨克文机构名识别系统。系统分为训练和识别两个模块,识别过程是:首先从训练语料中提取特征进行训练,得到一个特征训练模型,然后利用训练好的特征模型及少量的附加规则,对测试文本中的机构名进行识别,实验结果表明该方法可行。
关键词: N-gram语言模型, 哈萨克文机构名识别, 实体名识别
CLC Number:
TP391
FENG Jing-hua,Gulila·Altenbek,Mayra·Hapar. Kazakh organization name recognition based on N-gram model[J]. Computer Engineering and Applications, 2010, 46(31): 135-138.
冯鲸华,古丽拉·阿东别克,玛依来·哈帕尔. 基于N-gram语言模型的哈萨克文机构名识别[J]. 计算机工程与应用, 2010, 46(31): 135-138.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2010.31.038
http://cea.ceaj.org/EN/Y2010/V46/I31/135