计算机工程与应用 ›› 2010, Vol. 46 ›› Issue (31): 135-138.DOI: 10.3778/j.issn.1002-8331.2010.31.038

• 数据库、信号与信息处理 • 上一篇    下一篇

基于N-gram语言模型的哈萨克文机构名识别

冯鲸华,古丽拉·阿东别克,玛依来·哈帕尔   

  1. 新疆大学 信息科学与工程学院,乌鲁木齐 830046
  • 收稿日期:2010-01-07 修回日期:2010-04-20 出版日期:2010-11-01 发布日期:2010-11-01
  • 通讯作者: 冯鲸华

Kazakh organization name recognition based on N-gram model

FENG Jing-hua,Gulila·Altenbek,Mayra·Hapar   

  1. Information Science and Engineering College of Xinjiang University,Urumqi 830046,China
  • Received:2010-01-07 Revised:2010-04-20 Online:2010-11-01 Published:2010-11-01
  • Contact: FENG Jing-hua

摘要: 针对哈萨克文文本中机构名构成特点,提出了一种基于N-gram语言模型的哈萨克文机构名可信度计算方法,并以机构名尾词为触发词,构建了一个哈萨克文机构名识别系统。系统分为训练和识别两个模块,识别过程是:首先从训练语料中提取特征进行训练,得到一个特征训练模型,然后利用训练好的特征模型及少量的附加规则,对测试文本中的机构名进行识别,实验结果表明该方法可行。

关键词: N-gram语言模型, 哈萨克文机构名识别, 实体名识别

Abstract: Aiming at the characters of Kazakh organization name’ composition in Kazakh text,an effective method based on N-gram model for computing Kazakh organization name’ confidence is proposed.Using the tail words of Kazakh organization name as the burst words,this paper constructs a recognition system for Kazakh organization name.The system consists of a training module and a recognizing module.The recognition process is as follows:At first,features are extracted from the training corpus,and they are trained.A model is established,which has been trained by some features.Then,this model and some simple rule-bases are used to recognize Kazakh organization name in the testing corpus.The experimental results show that this method is feasible.

Key words: N-gram model, recognition of Kazakh organization name, name entity recognition

中图分类号: