计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (8): 117-124.DOI: 10.3778/j.issn.1002-8331.2009-0363

• 模式识别与人工智能 • 上一篇    下一篇

门控多特征提取器的中文命名实体识别

杨荣莹,何庆,杜逆索   

  1. 1.贵州大学 大数据与信息工程学院,贵阳 550025
    2.贵州大学 贵州省公共大数据重点实验室,贵阳 550025
    3.贵州大学 贵州省大数据产业发展应用研究院,贵阳 550025
  • 出版日期:2022-04-15 发布日期:2022-04-15

Chinese Named Entity Recognition Based on Gated Multi-Feature Extractors

YANG Rongying, HE Qing, DU Nisuo   

  1. 1.College of Big Data & Information Engineering, Guizhou University, Guiyang 550025, China
    2.Guizhou Provincial Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China
    3.Guizhou Province Big Data Industry Development and Application Research Institute, Guizhou University, Guiyang 550025, China
  • Online:2022-04-15 Published:2022-04-15

摘要: 在不引入其他辅助特征的情况下,仅关注文本自身,通过构建多个特征提取器深度挖掘文本序列抽象、深层、高维的特征。采用BERT预训练模型获取信息更丰富的词嵌入;将词嵌入分别输入到BiLSTM和IDCNN中进行第一轮的特征提取,为获取更高维的特征,实现信息的多通道传输和流量控制,在IDCNN网络中引入门控机制;为提高特征提取效率,加入多头自注意力机制;构建共享BiLSTM,实现特征信息的交互流通,提高特征表征强度;创建两个CRF模型,丰富特征分布并实现特征信息的跨层传输,以提升标签序列预测的准确性。在两个数据集上进行测试,与四种NER模型进行比较,结果表明,F1值在一定程度上得到提升。

关键词: 特征提取, 词嵌入, 门控机制, 共享BiLSTM, 多头自注意力

Abstract: Without introducing other auxiliary features, only focusing on the text, it constructs multiple feature extractors to capture more abstract, deeper, and higher-dimensional features of the text sequence. It uses the BERT pre-training model to obtain more rich information of word embedding. Word embedding is input into BiLSTM and IDCNN respectively for the first round of feature extraction. In order to obtain higher-dimensional features, transmitting information on multi-channel and control the flow, a gating mechanism is introduced in the IDCNN. In order to improve the efficiency of feature extraction, multi-head self-attention mechanism is added. It constructs share-BiLSTM, realizes the interactive circulation of features, improves the strength of feature representation. It creates two CRF to enrich feature distribution and cross-layer transmission, to promote the accuracy of predicting tag sequence. Tested on two data sets and compared with four NER models, the results show that the F1 value has been improved to a certain extent.

Key words: feature extraction, word embedding, gating mechanism, share-BiLSTM, multi-heads self-attention