计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (6): 277-282.DOI: 10.3778/j.issn.1002-8331.2109-0497

• 工程与应用 • 上一篇    下一篇

面向司法数据的事件抽取方法研究

贾阵,丁泽华,陈艳平,黄瑞章,秦永彬   

  1. 1.贵州大学 计算机科学与技术学院,贵阳 550025
    2.贵州大学 贵州省公共大数据重点实验室,贵阳 550025
  • 出版日期:2023-03-15 发布日期:2023-03-15

Research on Event Extraction Method for Judicial Data

JIA Zhen, DING Zehua, CHEN Yanping, HUANG Ruizhang, QIN Yongbin   

  1. 1.College of Computer Science and Technology, Guizhou University, Guiyang 550025, China
    2.Guizhou Provincial Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China
  • Online:2023-03-15 Published:2023-03-15

摘要: 司法数据中的事件主要用于描述案件中犯罪主体和客体之间行为状态的改变,通过识别司法事件能有效地支撑智能化辅助办案研究。目前,现有事件抽取技术主要通过触发词识别事件,然后根据预定义的模板抽取对应参数。其主要缺点是只能抽取预定义的事件类型,并且抽取的事件不一定是句子语义表达的中心。针对上述问题,提出一种基于谓语中心词的司法事件定义方法,并搭建一个结合字词语义信息的神经网络模型。该模型采用字的Embedding获取字的语义信息,并通过CNN获得词特征信息。将词特征信息结合后,使用Cross-BiLSTM交叉学习字词交互信息在上下文的依赖表示,由CRF计算出每个字的最优标签路径。通过实验表明,该模型在司法数据集上的F1值达到84.41%,超出对比方法4.8%。

关键词: 事件抽取, 谓语中心词, 信息抽取, 神经网络, 语义信息

Abstract: The events in the judicial data are mainly used to describe the changes in the behavioral state between the criminal subject and the object in the case, and the identification of judicial events can effectively support the intelligent auxiliary case-handling research. At present, the existing event extraction technology mainly recognizes events through trigger words, and then extracts corresponding parameters according to a predefined template. The disadvantage of it is that only predefined event types can be extracted, and the extracted events are not necessarily the center of the sentence semantic expression. Aiming at the above problems, this paper proposes a method for defining judicial events based on the predicate head, and builds a neural network model that combines the meaning of words and words. The model uses embedding of words to obtain semantic information of words, and obtains word feature information through CNN. After combining the word feature information, Cross-BiLSTM is used to cross-learn the context-dependent representation of the word interaction information. Finally, the CRF calculates the optimal tag path for each word. Experiments show that the F1 value of the model on the judicial data set reaches 84.41%, which exceeds the comparison method by 4.8%.

Key words: event extraction, predicate head, information extraction, neural networks, semantic information