计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (20): 132-137.DOI: 10.3778/j.issn.1002-8331.1907-0390

• 模式识别与人工智能 • 上一篇    下一篇

结合触发事件及词性分析的敏感信息识别方法

刘聪,王永利,周子韬,犹锋,张才俊   

  1. 1.南京理工大学 计算机科学与工程学院,南京 210094
    2.南瑞集团有限公司/国网电力科学研究院有限公司,江苏瑞中数据股份有限公司,南京 210094
    3.国家电网有限公司客户服务中心,南京 210094
  • 出版日期:2020-10-15 发布日期:2020-10-13

Sensitive Information Recognition Method Combining Trigger Event and Part of Speech Analysis

LIU Cong, WANG Yongli, ZHOU Zitao, YOU Feng, ZHANG Caijun   

  1. 1.School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
    2.Nari Group Corporation/State Grid Electric Power Research Institute Co., Ltd., Jiangsu Ruizhong Data Co., Ltd., Nanjing 210094, China
    3.Grid Customer Service Center, Nanjing 210094, China
  • Online:2020-10-15 Published:2020-10-13

摘要:

针对传统敏感信息识别方法忽略了上下文语境和关键词词性而导致的漏报、误报问题,提出一种改进文本敏感信息识别的方法STEAP。构建暴恐敏感词典;通过敏感触发事件的抽取构建敏感触发事件序列,结合敏感触发事件及关键词的词性为待识别的信息分配权重;将构建的触发事件与词向量、暴恐敏感词典进行相似度的计算,结合权重获得文本的敏感度。实验结果证明,与传统敏感信息识别方法相比,STEAP方法能够有效识别出文本中的敏感信息,并且在精确度上得到了一定提高。

关键词: 敏感触发事件, 词性序列, 敏感信息识别, 文本相似度

Abstract:

Aiming at the problem of false negatives and false positives caused by the context of contextual context and keyword part-of-speech, this paper proposes a method(STEAP) to improve the recognition of text-sensitive information. It constructs a terrorism sensitive dictionary. Through sensitive triggering, the extraction of events constructs a sequence of sensitive trigger events, and combines the sensitive trigger events and the part of speech of the keywords to assign weights to the information to be identified. It calculates the similarity of the constructed trigger event with the word vector and the terrorism sensitive dictionary, and combines the weights to obtain the sensitivity of the text. Experimental results show that compared with the traditional sensitive information recognition method, the STEAP method can effectively identify the sensitive information in the text, and the accuracy is improved.

Key words: sensitive trigger events, part of speech sequence, sensitive information recognition, text similarity