计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (2): 314-320.DOI: 10.3778/j.issn.1002-8331.2108-0118

• 工程与应用 • 上一篇    下一篇

面向招标物料的命名实体识别研究及应用

米健霞,谢红薇   

  1. 太原理工大学 软件学院,太原 030024
  • 出版日期:2023-01-15 发布日期:2023-01-15

Research and Application of Named Entity Recognition for Bidding Materials

MI Jianxia, XIE Hongwei   

  1. School of Software, Taiyuan University of Technology, Taiyuan 030024, China
  • Online:2023-01-15 Published:2023-01-15

摘要: 招标领域中各单位对物料数据的书写方法各不相同,通过对物料数据的实体识别能够实现对物料数据的标准化,为后续的物料查询及分析提供基础。传统的物料命名实体识别方法存在分词不准确,无法有效地处理一词多义,没有考虑中文特有的字形特征等问题,从而影响识别效果。针对上述问题,提出了一种CB-BiLSTM-CRF模型,采用卷积神经网络对汉字的五笔编码进行提取,与BERT所获得的字符特征相结合,以增强不同语境中的语法和语义信息的表征能力,通过BiLSTM模型对组合特征进行深层次提取处理,CRF模型获得最优序列结果。实验结果表明,该模型在收集到的招标领域中物料数据的F1值达到95.82%,优于其他常用模型。同时,在此基础上搭建了“智能物料”在线识别网页平台,用户可以快速在大量数据中提取到有效信息。

关键词: 命名实体识别, 招标物料识别, BERT预训练模型, 双向长短期记忆网络, 条件随机场

Abstract: In the bidding field, each unit has different writing methods for the material data. Through the entity identification of the material data, the standardization of the material data can be realized, which provides a basis for the subsequent material inquiry and analysis. The traditional identification method of named entity of materials has some problems, such as inaccurate word segmentation, unable to deal with polysemy effectively, and failing to consider the unique character characteristics of Chinese characters, which affect the recognition effect. In view of the above problems, a CB-BILSTM-CRF model is proposed, which uses CNN to extract the Wubi encoding of Chinese characters, and combines it with the character features obtained by BERT to enhance the representation ability of grammatical and semantic information in different contexts. The BiLSTM model is used to extract and process the combined features in a deep level. The CRF model obtains the optimal sequence results. The experimental results show that the F1 value of the material data collected by this model reaches 95.82%, which is better than other common models. At the same time, the “intelligent material” online identification web platform is built on this basis, so that users can quickly extract effective information from a large amount of data.

Key words: name entity recognition, bidding material identification, BERT, bi-directional long-short term memory(BiLSTM), conditional random field(CRF)