Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (7): 151-157.DOI: 10.3778/j.issn.1002-8331.1912-0430

Previous Articles     Next Articles

Chinese Named Entity Recognition Based on Denoising Joint Character-Word Model

YANG Qian, GU Lei   

  1. School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
  • Online:2021-04-01 Published:2021-04-02

基于去噪字词联合模型的中文命名实体识别

杨倩,顾磊   

  1. 南京邮电大学 计算机学院,南京  210023

Abstract:

Chinese Named Entity Recognition(NER) is a basic task in the field of Chinese information processing, which can provide technical support for relation extraction, entity linking and knowledge graph. Compared with the traditional namedentity recognition methods, the model based on Bidirectional Long Short-Term Memory(BiLSTM) neural network has achieved good results in the task of Chinese NER. A Gated denoising mechanism is introduced to reduce the defect of BiLSTM-CRF model based on joint character-word learning, such as inaccurate feature extraction. The mechanism can fine tune the input character vector, automatically learn to filter or reduce the unimportant character information in the text, and retain more useful information for Chinese NER, so as to improve the recognition rate of the named entity. The test results on Resume and Weibo datasets show that this method effectively improves the results of Chinese NER.

Key words: joint character-word, denoising mechanism, Long Short-Term Memory(LSTM), Chinese named entity recognition

摘要:

中文命名实体识别是中文信息处理领域中的一项基本任务,能够为关系抽取、实体链接和知识图谱提供技术支持。与传统命名实体识别方法相比,基于双向长短期记忆(BiLSTM)神经网络模型在中文命名实体识别任务中获得了较好的效果。针对基于字词联合的BiLSTM-CRF模型存在特征提取不够准确的缺陷,在其基础上,引入Gated去噪机制,对输入字向量进行微调,自动学习过滤或者减少文本中不重要的字信息,保留对命名实体识别任务更有用的信息,进而提高命名实体的识别率。在Resume和Weibo数据集上的测试结果表明,该方法有效地提高了中文命名实体识别的效果。

关键词: 字词联合, 去噪机制, 长短期记忆网络, 中文命名实体识别