Research on Event Extraction Method for Judicial Data

doi:10.3778/j.issn.1002-8331.2109-0497

Abstract

Abstract: The events in the judicial data are mainly used to describe the changes in the behavioral state between the criminal subject and the object in the case, and the identification of judicial events can effectively support the intelligent auxiliary case-handling research. At present, the existing event extraction technology mainly recognizes events through trigger words, and then extracts corresponding parameters according to a predefined template. The disadvantage of it is that only predefined event types can be extracted, and the extracted events are not necessarily the center of the sentence semantic expression. Aiming at the above problems, this paper proposes a method for defining judicial events based on the predicate head, and builds a neural network model that combines the meaning of words and words. The model uses embedding of words to obtain semantic information of words, and obtains word feature information through CNN. After combining the word feature information, Cross-BiLSTM is used to cross-learn the context-dependent representation of the word interaction information. Finally, the CRF calculates the optimal tag path for each word. Experiments show that the F1 value of the model on the judicial data set reaches 84.41%, which exceeds the comparison method by 4.8%.

Key words: event extraction, predicate head, information extraction, neural networks, semantic information

摘要： 司法数据中的事件主要用于描述案件中犯罪主体和客体之间行为状态的改变，通过识别司法事件能有效地支撑智能化辅助办案研究。目前，现有事件抽取技术主要通过触发词识别事件，然后根据预定义的模板抽取对应参数。其主要缺点是只能抽取预定义的事件类型，并且抽取的事件不一定是句子语义表达的中心。针对上述问题，提出一种基于谓语中心词的司法事件定义方法，并搭建一个结合字词语义信息的神经网络模型。该模型采用字的Embedding获取字的语义信息，并通过CNN获得词特征信息。将词特征信息结合后，使用Cross-BiLSTM交叉学习字词交互信息在上下文的依赖表示，由CRF计算出每个字的最优标签路径。通过实验表明，该模型在司法数据集上的F1值达到84.41%，超出对比方法4.8%。

关键词: 事件抽取, 谓语中心词, 信息抽取, 神经网络, 语义信息

JIA Zhen, DING Zehua, CHEN Yanping, HUANG Ruizhang, QIN Yongbin. Research on Event Extraction Method for Judicial Data[J]. Computer Engineering and Applications, 2023, 59(6): 277-282.

贾阵, 丁泽华, 陈艳平, 黄瑞章, 秦永彬. 面向司法数据的事件抽取方法研究[J]. 计算机工程与应用, 2023, 59(6): 277-282.

References

[1] HOOBS J R，RILOFF E.Information extraction[J].Handbook of Natural Language Processing，2010，15：16.
[2] DODDINGTO G R，MITCHELL A，PROIBOCKI M A，et al.The automatic content extraction（ACE） program-tasks，data，and evaluation[C]//International Conference on Language Resources and Evaluation，2004：837-840.
[3] DU X，CARDIE C.Event extraction by answering（almost） natural questions[J].arXiv：2004.13625，2020.
[4] YU W，YI M，HUANG X，et al.Make it directly：event extraction based on tree-LSTM and Bi-GRU[J].IEEE Access，2020，8：14344-14354.
[5] HONG Y，ZHOU W，ZHANG J，et al.Self-regulation：employing a generative adversarial network to improve event detection[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics（Volume 1：Long Papers），2018：515-526.
[6] YAN H，JIN X，MENG X，et al.Event detection with multi-order graph convolution and aggregated attention[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing（EMNLP-IJCNLP），2019：5770-5774.
[7] CUI S，YU B，LIU T，et al.Event detection with relation-aware graph convolutional neural networks[J].arXiv：2002.
10757，2020.
[8] WANG Z，WANG X，HAN X，et al.CLEVE：contrastive pre-training for event extraction[J].arXiv：2105.14485，2021.
[9] 贺瑞芳，段绍杨.基于多任务学习的中文事件抽取联合模型[J].软件学报，2019，30（4）：1015-1030.
HE R F，DUAN S Y.Join Chinese event extraction based multi-task learning[J].Journal of Software，2019，30（4）：1015-1030.
[10] 黄细凤.基于动态掩蔽注意力机制的事件抽取[J].计算机应用研究，2020，37（7）：1964-1968.
HUANG X F.Event extraction based on dynamic masked attention[J].Application Research of Computers，2020，37（7）：1964-1968.
[11] YANG H，CHEN Y，LIU K，et al.DCFEE：a document-level Chinese financial event extraction system based on automatically labeled training data[C]//Proceedings of ACL 2018，System Demonstrations，2018：50-55.
[12] INTXAURRONDO A，AGIRRE E，DE L O L，et al.Diamonds in the rough：event extraction from imperfect microblog data[C]//Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics：Human Language Technologies，2015：641-650.
[13] PETRONI F，RAMAN N，NUGENT T，et al.An extensible event extraction system with cross-media event resolution[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining，2018：626-635.
[14] 刘振.基于网络科技信息的事件抽取研究[J].情报科学，2018，36（9）：115-117.
LIU Z.Research on event extraction from networks scientific information[J].Information Science，2018，36（9）：115-117.
[15] 丁晟春，王莉，刘梦露.基于规则的动物卫生事件舆情信息抽取研究[J].计算机应用与软件，2018，35（9）：56-62.
DING C C，WANG L，LIU M L.Research on public opinion information extraction for animal health events based on rules[J].Computer Applications and Software，2018，35（9）：56-62.
[16] 汪瀛寰，薛婵，包先雨，等.触发词与属性值对联合抽取方法研究[J].计算机工程与应用，2020，56（9）：168-174.
WANG Y H，XUE C，BAO X Y，et al.Research on joint extraction of triggers and attribute-value pairs[J].Computer Engineering and Applications，2020，56（9）：168-174.
[17] 李婷，秦永彬，黄瑞章，等.基于神经网络的中文谓语动词识别研究[J].数据采集与处理，2020（3）：582-590.
LI T，QIN Y B，HUANG R Z，et al.Research on Chinese predicate verb recognition based on neural network[J].Journal of Data Acquisition and Processing，2020（3）：582-590.
[18] 陈艳平，冯丽，秦永彬，等.一种基于深度神经网络的句法要素识别方法[J].山东大学学报（工学版），2020，50（2）：44-49.
CHEN Y P，FENG L，QIN Y B，et al.A syntactic element recognition method based on deep neural network[J].Journal of Shandong University（Engineering Edition），2020，50（2）：44-49.
[19] NG A，JORDAN M.On discriminative vs.generative classifiers：a comparison of logistic regression and naive bayes[J].Advances in Neural Information Processing Systems，2002，14.
[20] ZHANG Y，YANG J.Chinese NER using lattice LSTM[J].arXiv：1805.02023，2018.