Computer Engineering and Applications ›› 2023, Vol. 59 ›› Issue (3): 276-281.DOI: 10.3778/j.issn.1002-8331.2108-0351

• Engineering and Applications • Previous Articles     Next Articles

Research on Prediction of Crime Based on Self-Supervised Learning Language Model

TIAN Jiewen, YANG Liang, ZHANG Li, MAO Guoqing, LIN Hongfei   

  1. 1.College of Information Engineering, Dalian Ocean University, Dalian, Liaoning 116023, China
    2.Dalian University of Technology, Dalian, Liaoning 116024, China
    3.Beijing Institute of Computer Technology and Application, Beijing 100854, China
    4.Beijing GridSum Technology Co., Ltd., Beijing 100083, China
  • Online:2023-02-01 Published:2023-02-01

基于自监督学习语言模型的罪名预测研究

田杰文,杨亮,张琍,毛国庆,林鸿飞   

  1. 1.大连海洋大学 信息工程学院,辽宁 大连 116023
    2.大连理工大学 辽宁 大连 116024
    3.北京计算机技术及应用研究所,北京 100854
    4.北京国双科技有限公司,北京 100083

Abstract: Aiming at solving the problem of crime prediction in legal judgment prediction, in order to capture the semantic information of context in case fact description more efficiently, this paper proposes a Chinese accusation prediction model, ALBT, which combines ALBERT(A Lite BERT) and convolutional neural network(TextCNN). Firstly, the model transforms the fact description of legal text into vector representation by using ALBERT model, the key features in fact description are extracted. Then, the extracted features are fed into the convolutional neural network TextCNN model for classification and prediction. Finally, the crime prediction in the fact description is completed. The accuracy of the experiment is 88.1% on the data set of 2018 “China Law Research Cup” judicial artificial intelligence challenge. The experimental results show that the model can achieve better prediction effect in Chinese accusation prediction.

Key words: ALBERT, TextCNN, feature extraction, text categorization, crime prediction

摘要: 针对解决法律判决预测中的罪名预测问题,为了更高效地捕捉案件事实描述中上下文的语义信息,提出了一种结合ALBERT(A Lite BERT)和卷积神经网络CNN(TextCNN)的中文罪名预测模型ALBT。模型利用ALBERT模型将法律文本的事实描述转化成向量表示,提取事实描述中的关键特征,把提取到的特征送入卷积神经网络TextCNN模型中进行分类预测,最终完成对事实描述中的罪名预测。实验在2018“中国法研杯”司法人工智能挑战赛构建的数据集上精度达到了88.1%。实验结果表明,模型在中文罪名预测上能够达到更好的预测效果。

关键词: ALBERT, TextCNN, 特征提取, 文本分类, 罪名预测