Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (16): 1-15.DOI: 10.3778/j.issn.1002-8331.2103-0127

Previous Articles     Next Articles

Overview of Chinese Domain Named Entity Recognition

JIAO Kainan, LI Xin, ZHU Rongchen   

  1. 1.School of Information Network Security, People’s Public Security University of China, Beijing 100038, China
    2.China Security Prevention Technology and Risk Assessment Key Laboratory of Ministry of Public Security, Beijing 100026, China
  • Online:2021-08-15 Published:2021-08-16

中文领域命名实体识别综述

焦凯楠,李欣,朱容辰   

  1. 1.中国人民公安大学 信息网络安全学院,北京 100038
    2.安全防范技术与风险评估公安部重点实验室,北京 100026

Abstract:

Named Entity Recognition(NER), as a classic research topic in the field of natural language processing, is the basic technology of intelligent question answering, knowledge graph and other tasks. Domain Named Entity Recognition(DNER) is the domain-specific NER scheme. Drived by deep learning technology, Chinese DNER has made a breakthrough. Firstly, this paper summarizes the research framework of Chinese DNER, and reviews the existing research results from four aspects:the determination of domain data sources, the establishment of domain entity types and specifications, the annotation of domain data sets, and the evaluation metrics of Chinese DNER. Then, this paper summarizes the current common technology framework of Chinese DNER, introduces the pattern matching method based on dictionaries and rules, statistical machine learning method, deep learning method, multi-party fusion deep learning method, and focuses on the analysis of Chinese DNER method based on word vector representation and deep learning. Finally, the typical application scenarios of Chinese DNER are discussed, and the future development direction is prospected.

Key words: natural language processing, Chinese domain named entity recognition, deep learning

摘要:

命名实体识别(Named Entity Recognition,NER)作为自然语言处理领域经典的研究主题,是智能问答、知识图谱等任务的基础技术。领域命名实体识别(Domain Named Entity Recognition,DNER)是面向特定领域的NER方案。在深度学习技术的推动下,中文DNER取得了突破性进展。概括了中文DNER的研究框架,从领域数据源的确定、领域实体类型及规范制定、领域数据集的标注规范、中文DNER评估指标四个角度对国内外已有研究成果进行了综合评述;总结了目前常见的中文DNER的技术框架,介绍了基于词典和规则的模式匹配方法、统计机器学习方法、基于深度学习的方法、多方融合的深度学习方法,并重点分析了基于词向量表征和深度学习的中文DNER方法;讨论了中文DNER的典型应用场景,对未来发展方向进行了展望。

关键词: 自然语言处理, 中文领域命名实体识别, 深度学习