计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (18): 111-115.DOI: 10.3778/j.issn.1002-8331.1903-0232

• 模式识别与人工智能 • 上一篇    下一篇

基于CRF的藏文地名识别技术研究

头旦才让,仁青东主,尼玛扎西   

  1. 1.西藏大学 信息科学技术学院,拉萨 850000
    2.青海师范大学 藏文信息处理教育部重点实验室,西宁 810008
  • 出版日期:2019-09-15 发布日期:2019-09-11

Research on Tibetan location Name Recognition Technology Under CRF

Thupten Tsering, Rinchen Dhondub, Nyima Tashi   

  1. 1.School of Information Science and Technology, Tibet University, Lhasa 850000, China
    2.Key Laboratory of Tibetan Information Processing, Ministry of Education, Qinghai Normal University, Xining 810008, China
  • Online:2019-09-15 Published:2019-09-11

摘要: 藏文地名识别是藏文命名实体识别中必须要解决的问题。通过分析藏文地名的特点及识别难点,阐述了藏文地名的音节、触发词、地名后续词和格助词等特性适用基于CRF模型的地名识别,通过实验,验证了6种特征对藏文地名识别的有效性。实验结果表明该方法对藏文地名识别的准确率、召回率和[F]值分别达到了96.12%、81.92%和88.45%,实验结果与已有的系统相比,取得了较好的效果。

关键词: CRF模型, 藏文地名, 地名识别

Abstract: Tibetan location name recognition is a problem that must be solved in Tibetan named entity recognition. By analyzing the characteristics and recognition difficulties of Tibetan location names, this paper expounds that the characteristics of syllables, trigger words, location name follow-up words and case auxiliary words of Tibetan location names are applicable to location name recognition based on CRF model. Through experiments, the effectiveness of the six characteristics of this paper on Tibetan location name recognition is verified. The experimental results show that the accuracy rate, recall rate and [F] value of Tibetan location name recognition by this method reach 96.12%, 81.92% and 88.45%, respectively. Compared with the existing systems, the experimental results have achieved better results.

Key words: CRF model, Tibetan location, location name recognition