Computer Engineering and Applications ›› 2020, Vol. 56 ›› Issue (21): 115-122.DOI: 10.3778/j.issn.1002-8331.1908-0164

Previous Articles     Next Articles

Entity Attributes Extraction Based on Text Simplification

WU Cheng, WANG Chaokun, WANG Muxian   

  1. 1.School of Software, Tsinghua University, Beijing 100084, China
    2.School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
  • Online:2020-11-01 Published:2020-11-03



  1. 1.清华大学 软件学院,北京 100084
    2.哈尔滨工业大学 计算机学院,哈尔滨 150001


In this paper, the method of entity attributes extraction on unstructured Chinese text is studied. Text Simplification(TS) is introduced as the pretreatment process of extraction to solve the problem that traditional information extraction methods are ineffective because of the existence of long and difficult sentences and the diversity of natural language expressions. TS is modeled as a sequence to sequence(seq2seq) procedure, and is implemented with the seq2seq-RNN model in the machine translation field. To improve the model, several strategies, including pre-trained word vectors, common vocabulary, POS tagging and simplifying scoring function, are introduced to make the model focus more on syntax transformation during TS. For the simplified text, a simple rule-based method is used to perform information tuple extraction, and later entity attributes are extracted from those tuples. The experimental results show that the improvements on seq2seq-RNN achieve better performance on text simplification, and the amount of information extracted from the simplified text is more than the original text, while the information is more accurate.

Key words: text simplification, information extraction, entity attributes, natural language processing, neural network



关键词: 文本化简, 信息抽取, 实体属性, 自然语言处理, 神经网络