计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (1): 125-129.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

面向中文专利的开放式实体关系抽取研究

赵奇猛,王裴岩,冯好国,蔡东风   

  1. 沈阳航空航天大学 知识工程研究中心,沈阳 110136
  • 出版日期:2015-01-01 发布日期:2015-01-06

Research on Chinese-patents-oriented open entity relation extraction

ZHAO Qimeng, WANG Peiyan, FENG Haoguo, CAI Dongfeng   

  1. Research Center for Knowledge Engineering, Shenyang Aerospace University, Shenyang 110136, China
  • Online:2015-01-01 Published:2015-01-06

摘要: 针对传统实体关系抽取需要预先指定关系类型和制定抽取规则等无法胜任大规模文本的情况,开放式信息抽取(Open Information Extraction,OIE)在以英语为代表的西方语言中取得了重大进展,但对于汉语的研究却显得不足。为此,研究了在组块层次标注基础上应用马尔可夫逻辑网分层次进行中文专利开放式实体关系抽取的方法。实验表明:以组块为出发点降低了对句子理解的难度,外层和内层组块可以统一处理,减少了工程代价;而且在相同特征条件下与支持向量机相比,基于马尔可夫逻辑网的关系抽取效果更理想,外层和内层识别结果的F值分别可达到77.92%和69.20%。

关键词: 中文专利依存树库, 开放式实体关系抽取, Markov逻辑网

Abstract: The main goal of information extraction is to transform unstructured or semi-structured texts into structured information, in which entity relation extraction is a major task. In general, traditional methods require pre-specified relation types. But pre-defined rules and manual labels are not adaptive to massive texts. Recently, open information extraction can solve the problems properly. In contrast with the significant achievements concerning English and other Western languages, research on Chinese open relation extraction is quite scarce. The hierarchical Chinese open entity relation extraction approach is proposed that applies Markov Logic Networks(MLN) on the base of both external and internal chunk-tags. The experimental results reveal that the origin of chunks can simplify the understanding of sentences, and both layers can be handled consistently so that engineering efforts are reduced. And on the same conditions, MLN can perform better than SVM, in which the F-score of external and internal layers can reach 77.92% and 69.20% respectively.

Key words: Chinese patents dependency treebank, open entity relation extraction, Markov Logic Networks(MLN)