Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (22): 142-149.DOI: 10.3778/j.issn.1002-8331.2104-0397

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Knowledge Base Construction Method for Scientific and Technical Information Analysis

WANG Yong, JIANG Yang, WANG Hongbin, HOU Sha   

  1. 1.College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China
    2.The 714 Research Institute of China State Shipbuilding Corporation Limited, Beijing 100101, China
  • Online:2022-11-15 Published:2022-11-15

面向科技情报分析的知识库构建方法

王勇,江洋,王红滨,侯莎   

  1. 1.哈尔滨工程大学 计算机科学与技术学院,哈尔滨 150001
    2.中国船舶集团有限公司 第七一四研究所,北京 100101

Abstract: In the construction of knowledge base, the most important part is to extract the triplets in the text, and the extraction of triples requires entity extraction and entity relationship extraction techniques. A CWATT-BiLSTM-LSTMd(character word attention-bidirectional long short-term memory-long short-term memory) model for entity extraction is proposed. The model can effectively solve the polysemy problem in entity extraction, and simulate the dependency of the tag. On the basis of entity extraction, entity relationship extraction is performed. To solve the limitation of remote supervision in entity relationship extraction, a RL-TreeLSTM(reinforcement learning tree long short-term memory) model based on enhanced deep learning is proposed. The selector and classifier are trained together to optimize the selection and classification process, which can effectively reduce the noise caused by remote supervision. The experimental results show that the proposed model in this paper can effectively extract entities and their relationships.

Key words: knowledge base construction, neural networks, reinforcement learning, entity extraction, entity relation extraction

摘要: 在知识库构建中,最重要的部分就是提取文本中的三元组,而三元组的提取需要实体抽取和实体关系抽取技术。针对实体抽取提出了一种CWATT-BiLSTM-LSTMd(character word attention-bidirectional long short-term memory-long short-term memory)模型。该模型可以有效解决实体抽取中一词多义问题,并且可以模拟标签的依赖问题。在实体抽取的基础上进行实体关系的抽取,为解决实体关系抽取中远程监督的局限性,提出一种基于强化深度学习的RL-TreeLSTM(reinforcement learning tree long short-term memory)模型。该模型分为选择器和分类器,选择器选择有效的句子传入分类器,分类器对句子中实体对的关系标签进行预测。选择器和分类器共同训练以优化选择和分类过程,可以有效降低远程监督带来的噪音。实验结果表明,提出的模型和方法能有效地提高实体及其关系的抽取性能。

关键词: 知识库构建, 神经网络, 强化学习, 实体抽取, 实体关系抽取