Computer Engineering and Applications ›› 2020, Vol. 56 ›› Issue (17): 243-250.DOI: 10.3778/j.issn.1002-8331.1906-0062

Previous Articles     Next Articles

Research on Entity Relation Extraction Based on Distant Supervision in Bidding Field

CHEN Yuting, LIU Xuhong, LIU Xiulei   

  1. 1.Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science and Technology University, Beijing 100101, China
    2.School of Computer, Beijing Information Science and Technology University, Beijing 100101, China
  • Online:2020-09-01 Published:2020-08-31

面向招投标领域的远程监督实体关系抽取研究

陈雨婷,刘旭红,刘秀磊   

  1. 1.北京信息科技大学 网络文化与数字传播北京市重点实验室,北京 100101
    2.北京信息科技大学 计算机学院,北京 100101

Abstract:

The bidding website resources contain rich potential intelligence information. With the help of knowledge base, distant supervision can automatically annotate data, which makes up for the shortcomings of traditional information extraction methods that rely heavily on manual work in the stage of corpus preparation, and can effectively improve the efficiency of information extraction. However, due to the noise data caused by this method, the effect of information extraction is not ideal. In this paper, a method of distant supervision for entity relation extraction method based on the factor graph model is proposed. Combined with domain characteristics, knowledge fusion technology is used to improve the quality of entity extraction, and then a noise reduction method based on learning from negative case data is proposed to overcome the shortcomings of distant supervision. The experimental results show that the proposed method can effectively reduce the “noise” interference and improve the performance of relation extraction.

Key words: entity relation extraction, distant supervision, factor graph module, knowledge fusion

摘要:

招投标网站资源中蕴含着丰富的情报信息。“远程监督”方法借助知识库自动标注数据,弥补了传统信息抽取方法在语料准备阶段对人工强依赖的缺陷,可有效提高信息抽取效率。该方法会引入噪声数据,导致信息抽取效果不够理想。因此,提出一种基于因子图模型的远程监督实体关系抽取方法,并结合领域特征,采用知识融合技术提高实体抽取质量,进而针对远程监督的缺陷提出基于负例数据学习的降噪方法。实验结果表明,该方法能够有效减少“噪声”干扰,提高关系抽取性能。

关键词: 实体关系抽取, 远程监督, 因子图模型, 知识融合