Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (18): 227-232.DOI: 10.3778/j.issn.1002-8331.2102-0130

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Chinese Named Entity Recognition Combined with Gazetteers and Syntactic Dependency Tree

FANG Hong, SU Ming, FENG Yibo, ZHANG Lan   

  1. 1.College of Arts and Sciences, Shanghai Polytechnic University, Shanghai 201209, China
    2.College of Engineering, Shanghai Polytechnic University, Shanghai 201209, China
    3.College of Mathematics and Statistics, Kashgar University, Kashgar, Xinjiang 844000, China
  • Online:2022-09-15 Published:2022-09-15

结合gazetteers和句法依存树的中文命名实体识别

方红,苏铭,冯一铂,张澜   

  1. 1.上海第二工业大学 文理学部,上海 201209
    2.上海第二工业大学 工学部,上海 201209
    3.喀什大学 数学与统计学院,新疆 喀什 844000

Abstract: Chinese named entity recognition plays an important role in downstream tasks such as machine translation and intelligent question answering. A new Chinese named entity recognition algorithm based on gazetteers and syntactic dependency tree is proposed in this paper. To solve the problem of error transmission caused by the lack of word information in character vector and syntactic dependent structure information between words. This algorithm forms a graph of the gazetteers information and the syntactic dependency tree information in the sentence and then integrates it into the character vector through adaptive gated graph neural networks(AGGNN), so that the semantic relationship between words is obtained well in each character vector and the recognition accuracy is improved. Through the verification in Ecommerce, Resume, QI and other data sets, the new method can greatly improve the accuracy of Chinese entity recognition.

Key words: gazetteers, syntactic dependency tree, sequence labeling, adaptive gated graph neural networks(AGGNN), bi-directional long short-term memory(BiLSTM), conditional random field(CRF)

摘要: 中文命名实体识别在机器翻译、智能问答等下游任务中起着重要作用。提出一种新的基于gazetteers和句法依存树的中文命名实体识别方法,旨在解决由于字符向量缺少词信息和词之间的句法依赖结构信息而导致的错误传递问题。该方法将句子中的gazetteers信息和句法依存树信息形成图,再通过自适应门控图神经网络(adapted gated graph neural networks,AGGNN)将其融入到字符向量中,从而使得每个字向量很好地获取词汇间的语义关系,提升识别准确率。通过在Ecommerce、Resume、QI等数据集的验证,新的方法可以使得中文实体识别的准确率得到较大提升。

关键词: gazetteers, 句法依存树, 序列标注, 自适应门控图神经网络(AGGNN), 双向长短记忆网络(BiLSTM), 条件随机场(CRF)