计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (5): 236-241.DOI: 10.3778/j.issn.1002-8331.1609-0353

• 工程与应用 • 上一篇    下一篇

面向维吾尔文的敏感信息过滤方法研究

薛朋强,鲜  英,努尔布力,吾守尔·斯拉木   

  1. 新疆大学 信息科学与工程学院,乌鲁木齐 830046
  • 出版日期:2018-03-01 发布日期:2018-03-13

Sensitive information filtering algorithm based on Uyghur text information network research

XUE Pengqiang, XIAN Ying, Nurbol, Wushour Silamu   

  1. School of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
  • Online:2018-03-01 Published:2018-03-13

摘要: 在这个信息量爆炸性增长的时代,网络不良信息充斥在日常生活当中。为了构建洁净的网络环境,实现对网络维吾尔文中敏感信息的检测与过滤,通过分析维吾尔文的特点,将维吾尔文进行词干提取和编码化处理,再结合DFA和决策树提出了针对维吾尔文敏感信息过滤的相应方法。维吾尔文经过词干提取、编码化处理解决了维吾尔文书写顺序和形式多样、存储易出现乱码等问题。再结合决策树的特点,将转码后的维吾尔文信息存储于决策树节点上,将决策树的子节点按照特定编码进行顺序排列。这样维吾尔文文本信息进行敏感信息过滤时可以缩小检测范围,提高算法效率。

关键词: 敏感信息过滤, 确定性自动机, 维吾尔文过滤, 决策树

Abstract: In this era of information explosion, bad information of the network is full of daily life. In order to build a clean network environment and realize the Uyghur web page detection and filtering of sensitive information, by analyzing the characteristics of the Uyghur, the Uyghur is done stemming and encoding processing. Combining the DFA and the decision tree, it puts forward the corresponding method for Uyghur sensitive information filtering. After stemming and encoding processing, it solves the Uyghur writing sequence, form of diversity, and storage with garbled words easily problems, etc. And then combining the characteristics of the decision tree, the transcoding Uyghur is stored on the decision tree node, and the child nodes of the decision tree are arranged in order in accordance with the specific code. So the information of Uyghur text can narrow the detection range when the information is filtered, and improve the efficiency of algorithm.

Key words: sensitive information filtering, deterministic finite automaton, Uyghur filter, decision tree