Computer Engineering and Applications ›› 2020, Vol. 56 ›› Issue (23): 153-160.DOI: 10.3778/j.issn.1002-8331.2004-0097

Previous Articles     Next Articles

Improved Method Study on Extracting Keywords in Chinese Judgment Documents

BAI Fengbo, CHANG Lin, WANG Shifan, LI Bin, WANG Yingjie, ZHOU Hong, LIU Yao   

  1. 1.Institute of Evidence Law and Forensic Science, China University of Political Science and Law, Beijing 100088, China
    2.Di’an Institute of Forensic Sciences in Zhejiang, Hangzhou 310000, China
    3.School of Software Engineering, University of Science and Technology of China, Suzhou, Jiangsu 215000, China
    4.College of Information Engineering, Dalian University, Dalian, Liaoning 116622, China
    5.Institute of Forensic Sciences, Ministry of Public Security, Beijing 100038, China
  • Online:2020-12-01 Published:2020-11-30



  1. 1.中国政法大学 证据科学研究院,北京 100088
    2.浙江迪安鉴定科学研究院,杭州 310000
    3.中国科学技术大学 软件学院,江苏 苏州 215000
    4.大连大学 信息工程学院,辽宁 大连 116622
    5.公安部物证鉴定中心,北京 100038


Under the national policy the guidance to rule the country by law, it is an inevitable trend to combine the field of artificial intelligence, such as NLP(Natural Language Processing) and IR(Information Retrieve), with the need to rule of law. In this paper, through the research of keyword extraction method for judicial documents, the purpose is to provide accurate and comprehensive intelligent assistance for judicial service workers to improve work efficiency. This paper proposes an improved TF-IDF algorithm, named Improved Algorithm for Keyword Extraction in Forensics(IAKEF), targeting to the disadvantages of traditional keyword extraction methods, according to the multiple factors such as part of speech, length, word span, position and document category, based on the TextRank algorithm of graph model, introducing the concepts of information entropy, dispersion degree and fusion features. The algorithm mainly solves the problems of traditional algorithms for semantic neglect of words and distribution of information among classes or a class inner, so that the features from text can be selected more effectively. With the experiments and the comparison of algorithms, the improvement effect is analyzed and verified, the experimental results show that the improved algorithm has a significant improvement in accuracy, recalling-rate and F1-Measure compared with the traditional algorithm.

Key words: improved TF-IDF, keyword extraction, information entropy, dispersion, feature fusion



关键词: 改进TF-IDF, 关键词抽取, 信息熵, 离散度, 特征融合