Computer Engineering and Applications ›› 2024, Vol. 60 ›› Issue (22): 162-171.DOI: 10.3778/j.issn.1002-8331.2307-0395

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Chinese Named Entity Recognition Based on External Knowledge and Position Information

LI Yuan, Luosang Gadeng, JIANG Weili   

  1. 1.Faculty of Information Engineering, Xinyang Agriculture and Forestry University, Xinyang, Henan 464000, China
    2.School of Information Science and Technology, Tibet University, Lhasa 850000, China
    3.College of Computer Science, Sichuan University, Chengdu 610207, China
  • Online:2024-11-15 Published:2024-11-14

融合外部知识和位置信息的中文命名实体识别

李源,洛桑嘎登,蒋卫丽   

  1. 1.信阳农林学院 信息工程学院,河南 信阳 464000
    2.西藏大学 信息科学技术学院,拉萨 850000
    3.四川大学 计算机学院,成都 610207

Abstract: Named entity recognition (NER) is an important and fundamental task in the field of information retrieval and natural language processing. Different from English, existing Chinese NER methods suffer from Chinese word segmentation (CWS) problem, and lack of domain knowledge. To solve the above problems, this paper proposes a Chinese NER method that combines knowledge graphs embedding (KGE) and position information with mask to enhance Lattice semantics. The use of Lattice information lays a structural foundation for completing word-level information and solving the CWS problem. The use of KGE can supplement and locate the missing domain knowledge of pre-trained language models. The use of position information with mask can solve the problem of knowledge noise caused by using knowledge graphs. The method proposed in this paper works well both in the general domain and the specific domain, and the F1 values on Weibo, Resume and CCKS 2017 reach 74.01%, 96.62% and 94.95%, respectively.

Key words: Lattice, knowledge graphs embedding, position information, Chinese named entity recognition

摘要: 命名实体识别(named entity recognition,NER)是信息检索和自然语言处理领域重要且基础的任务。与英文不同,目前大部分的中文NER方法,都面临分词困扰、领域知识缺失的问题。针对以上问题,基于Lattice结构,提出一种结合知识图谱嵌入(knowledge graphs embedding,KGE)和带掩码位置信息的中文NER模型。Lattice语义信息的使用,为补充词粒度信息和解决分词问题奠定了结构基础。知识图谱嵌入的使用,为模型补充并定位了其所缺失的领域知识。而带掩码位置信息的使用,则较好地解决了由于知识图谱的引入而带来的知识噪声问题。所提出的方法在通用领域和垂直领域上均能取得较好的表现,在Weibo、Resume以及CCKS 2017上的F1值分别达到了74.01%、96.62%、94.95%。

关键词: Lattice, 知识图谱嵌入, 位置信息, 中文命名实体识别