计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (9): 168-174.DOI: 10.3778/j.issn.1002-8331.1901-0151

• 模式识别与人工智能 • 上一篇    下一篇

触发词与属性值对联合抽取方法研究

汪瀛寰,薛婵,包先雨,吴共庆   

  1. 1.合肥工业大学 计算机与信息学院,合肥 230601
    2.深圳市检验检疫科学研究院,广东 深圳 518045
  • 出版日期:2020-05-01 发布日期:2020-04-29

Research on Joint Extraction of Triggers and Attribute-Value Pairs

WANG Yinghuan, XUE Chan, BAO Xianyu, WU Gongqing   

  1. 1.School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China
    2.Shenzhen Academy of Inspection and Quarantine, Shenzhen, Guangdong 518045, China
  • Online:2020-05-01 Published:2020-04-29

摘要:

传统的属性值对抽取方法通常应用于短文本,且仅限于抽取字符串属性。提出一种触发词与属性值对的联合抽取方法,不仅能够通过识别触发词确定长文本中的信息语句,从而确定二元语义属性的取值,而且能够考虑触发词、字符串属性和属性值的相互依赖关系,基于条件随机场构建联合标记模型,提高字符串属性值对的抽取性能。实验结果显示,与传统方法相比,所提出的方法能够抽取二元语义属性值对,并且对字符串属性的抽取准确率、召回率和F值分别提高15.3%、15.5%和15.5%,同时抽取所用平均时间降低76.29%。

关键词: 条件随机场, 序列标注, 属性值对抽取, 触发词扩展

Abstract:

Traditional attribute-value pair extraction methods are usually applied to short texts, and are limited to extract string attributes. In this work, a joint extraction of triggers and attribute-value pairs is proposed. The method not only can use triggers to obtain information sentences from long texts for identifying semantic attributes, but also can make full use of the interdependence among of trigger, attributes and values. Based on conditional random field a joint labeling model is Constructed to improve the extraction performance of string attribute-value pairs. Experimental results show that comparing with traditional methods, the proposed method can extract semantic attributes and improve the precision, recall and F-measure of string attributes by 15.3%, 15.5% and 15.5% respectively. At the same time, the average time of extraction is reduced by 76.29%.

Key words: conditional random field, sequence labeling, attribute-value pair extraction, trigger extension