计算机工程与应用 ›› 2014, Vol. 50 ›› Issue (15): 172-176.

• 信号处理 • 上一篇    下一篇

基于感知机模型藏文命名实体识别

华却才让1,2,姜文斌3,赵海兴1,刘  群3   

  1. 1.陕西师范大学 计算机学院,西安 710062
    2.青海师范大学 藏文信息研究中心,西宁 810008
    3.中国科学院 计算技术研究所,北京 100190
  • 出版日期:2014-08-01 发布日期:2014-08-04

Tibetan name entity recognition with perceptron model

HUA Quecairang1,2, JIANG Wenbin3, ZHAO Haixing1, LIU Qun3   

  1. 1.Computer Science School of Shaanxi Normal University, Xi’an 710062, China
    2.Tibetan Information Research Center, Qinghai Normal University, Xining 810008, China
    3.Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
  • Online:2014-08-01 Published:2014-08-04

摘要: 藏文命名实体识别是藏文分词和标注系统中必须要解决的问题。通过对命名实体构词规律及分词歧义进行分析,提出基于音节特征感知机训练模型的藏文命名实体识别方案。重点研究了利用藏文紧缩格识别音节的方法,命名实体内部和边界音节的模型训练特征模板,训练模型,以及命名实体分类识别方法。提出的藏文命名实体识别方法在测试集上获得86.03%的F值,相对基于分词的基线系统高出10.5%个点。

关键词: 藏文音节, 命名实体, 藏文命名实体, 感知机模型

Abstract: Tibetan name entity recognition is essential for Tibetan text segmentation and the part of speech tagging. This paper proposes a syllable features perceptron training model to identify Tibetan name entity with detail analysis NE structure rule and word segmentation ambiguity. It focuses on Tibetan syllable segmentation, training features templates of inner and boundary of NE, training model and NE classification method. The F-score of NE identification is 86.03% for the test set, and 10.5% higher than the Tibetan segmentation baseline system.

Key words: Tibetan syllable, Name Entity(NE), Tibetan NE, perceptron model