计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (2): 178-184.DOI: 10.3778/j.issn.1002-8331.2108-0344

• 模式识别与人工智能 • 上一篇    下一篇

基于多任务学习的多罪名案件信息联合抽取

王卓越,陈彦光,邢铁军,孙媛媛,杨亮,林鸿飞   

  1. 1.大连理工大学 计算机科学与技术学院,辽宁 大连 116024
    2.东软集团股份有限公司,沈阳 110179
  • 出版日期:2023-01-15 发布日期:2023-01-15

Joint Entity and Relation Extraction for Multi-Crime Legal Documents with Multi-Task Learning

WANG Zhuoyue, CHEN Yanguang, XING Tiejun, SUN Yuanyuan, YANG Liang, LIN Hongfei   

  1. 1.College of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
    2.Neusoft Corporation, Shenyang 110179, China
  • Online:2023-01-15 Published:2023-01-15

摘要: 面向法律文本的实体关系联合抽取技术对于案情关键信息的智能提取至关重要,是智慧司法领域应用中的重要环节。目前的联合抽取方法虽然已经在特定罪名案件的数据集上取得了较好的效果,但是由于模型在训练时只关注了特定罪名类型文本数据的特点,使得模型的泛化能力有限,在应用到多罪名案件的情况下常常使得模型的效果下降。因此引入多任务学习的方法对多罪名情形下的实体关系联合抽取进行了研究,以涉毒类案件和盗窃类案件两大类罪名的文书数据为基础,构建了一个罪名分类任务作为联合抽取的辅助任务,通过基于特征筛选的动态加权多任务模型同时对两个任务进行学习,在单任务模型的基础上整体F1值提升了2.4个百分点,在涉毒类案件和盗窃类案件上的F1值分别提升了1.6和3.2个百分点。

关键词: 实体关系联合抽取, 多任务学习, 智慧司法

Abstract: Joint entity recognition and relation extraction on legal documents is important for automatic extraction of the crucial information of the legal cases. And it is a crucial part for legal intelligence application. The current triplet extraction methods have achieved good results on specific crime cases, while since these models only pay attention to the text features of specific crime type during training, the generalization ability of the model is limited, which usually leads to a decrease in the performance when applying to multi-crime legal documents. Therefore, it leverages the multi-task learning method for triplet extraction on multi-crime legal documents. The experiments are based on two categories of crimes involving drug-related cases and larceny-related cases. It constructs a crime classification task as auxiliary task and trains the two tasks simultaneously by the dynamic weight with feature filtering multi-task model. From the experimental results, compared with the single-task model, this model improves the F1 value by 2.4 percentage points on the whole, by 1.6 and 3.2 percentage points on drug-related cases and larceny-related cases respectively.

Key words: joint entity and relation extraction, multi-task learning, legal intelligence