Research on Filtering Algorithm for Senstive Information in Multi-form Uyghur

doi:10.3778/j.issn.1002-8331.1901-0195

Abstract

Abstract:

The existing research on Uyghur sensitive information detection and filtering is limited to traditional Uyghur. Now Uyghur on the Internet uses the “one-word double-text” feature of traditional Uyghur and Latin Uyghur. The sensitive information filtering algorithm of the text realizes the filtering of the sensitive information of traditional Uyghur and Latin Uyghur, which has important practical significance for the network security and social stability of Xinjiang and the realization of the overall goal of lasting stability. The coding rules of Latin Uyghur and traditional Uyghur are studied by putting forward the ULTC（Uyghur Latin Traditional Conversion）, which is a code conversion algorithm between them. By adding the Latin Uyghur sensitive information corpora to the existing traditional Uyghur sensitive information corpora, a multi-form Uyghur sensitive information corpus is constructed. Based on the corpus of ULSC（Uyghur Latin Sensitive Corpus）, a method for calculating the multi-form Uyghur sensitive values is proposed, and a multi-form Uyghur sensitive information decision tree LUDT（Latin Uyghur Decision Tree） that integrates traditional Uyghur and Latin Uyghur is constructed. Based on LUDT, the multi-form Uyghur Sensitive Information Filtering（USF） algorithm is proposed. Experimental results show that the USF algorithm has a high recall rate.

Key words: traditional Uyghur, Latin Uyghur, sensitive information, decision tree

摘要：

现有的维文敏感信息检测与过滤研究只限于传统维文，而现在互联网上的维文使用呈现传统维文和拉丁维文共存的“一语双文”特点，因此，研究多形式维文的敏感信息过滤算法对新疆的网络安全及社会稳定和长治久安总目标的实现有重要的实际意义。研究拉丁维文和传统维文的Unicode编码特征，提出它们间的编码转换算法ULTC（Uyghur Latin Traditional Conversion），通过该算法在已有的语料库中添加拉丁维文敏感信息语料，从而构建多形式维文敏感信息语料库ULSC（Uyghur Latin Sensitive Corpus）；在语料库的基础上构建传统维文和拉丁维文一体化的多形式维文敏感信息决策树LUDT（Latin Uyghur Decision Tree），在决策树的基础上提出多形式维文敏感信息过滤算法USF（Uyghur Sensitive Information Filter）。实验结果表明，USF算法具有较高的查全率。

关键词: 传统维文, 拉丁维文, 敏感信息, 决策树

Yibulayin·Wusiman, GUO Wenqiang, YU Kai. Research on Filtering Algorithm for Senstive Information in Multi-form Uyghur[J]. Computer Engineering and Applications, 2020, 56(10): 127-133.

依不拉音·吾斯曼，郭文强，于凯. 面向多形式维文的敏感信息过滤算法研究[J]. 计算机工程与应用, 2020, 56(10): 127-133.

[1]	ZHANG Min, PENG Hongwei, YAN Xiaoling. Improved Algorithm of Fuzzy Decision Tree Based on Neural Network [J]. Computer Engineering and Applications, 2021, 57(21): 174-179.
[2]	TAN Zhenghua, DAI Liping, WEN Yang, LI Guotai. Decision Tree Construction Method Based on Reduction Attribute and Threshold Segmentation [J]. Computer Engineering and Applications, 2020, 56(22): 160-165.
[3]	LIU Cong, WANG Yongli, ZHOU Zitao, YOU Feng, ZHANG Caijun. Sensitive Information Recognition Method Combining Trigger Event and Part of Speech Analysis [J]. Computer Engineering and Applications, 2020, 56(20): 132-137.
[4]	WANG Limiao, XU Qinglin, JIANG Wenchao, FU Jigao. Short Video Preference Rate Prediction Model with Integrated FM [J]. Computer Engineering and Applications, 2020, 56(14): 118-122.
[5]	SU Chong, REN Tong, WANG Guopin, YIN Jie. Using K-L Divergence Based Decision Tree to Build Traditional Chinese Medicine Diagnosis Model on COPD [J]. Computer Engineering and Applications, 2019, 55(3): 225-230.
[6]	WANG Yuyuan, XU Jie, JI Weixi. Intelligent Recognition Method for Geometric Features of Parts Based on Supervised Machine Learning [J]. Computer Engineering and Applications, 2019, 55(22): 225-230.
[7]	LIU Junjie, WANG Jun, WANG Menglin, WANG Yue. DDoS Attack Detection Based on C4.5 in SDN [J]. Computer Engineering and Applications, 2019, 55(20): 84-88.
[8]	WANG Yan, GUO Yuankai. Application of Improved XGBoost Model in Stock Forecasting [J]. Computer Engineering and Applications, 2019, 55(20): 202-207.
[9]	CAO Weidong1，2, XU Daidai2, WANG Jing2, WANG Jialiang2. NOSHOW Prediction and Strong Factor Association Analysis in Civil Aviation [J]. Computer Engineering and Applications, 2019, 55(2): 221-227.
[10]	ZHAO Pan1, YUAN Jie1, WANG Hongwei1，2, MI Tang1. Research on Autonomous Decision-Making of Plume Tracking Robots Using Decision Tree [J]. Computer Engineering and Applications, 2019, 55(14): 254-259.
[11]	AN Weipeng, SHANG Jiaze. Improvement and Analysis of C4.5 Decision Tree Algorithm [J]. Computer Engineering and Applications, 2019, 55(12): 169-173.
[12]	PAN Jie1, WATANABE Masahiko2, ZHOU Kuanjiu1, LIANG Haoran1, CUI Kai1. Formal modeling approach for embedded software [J]. Computer Engineering and Applications, 2018, 54(8): 61-71.
[13]	WANG Liangnan, XIAO Di. Research on FDT ensemble methods based on CCS optimization [J]. Computer Engineering and Applications, 2018, 54(5): 127-131.
[14]	XUE Pengqiang, XIAN Ying, Nurbol, Wushour Silamu. Sensitive information filtering algorithm based on Uyghur text information network research [J]. Computer Engineering and Applications, 2018, 54(5): 236-241.
[15]	DU Guodong1, LV Yunhui2, MA Lei1, XIANG Yan1, SHAO Dangguo1, LEI Qiang1, HU Rong1. Brief modeling study of OSAHS patients screening from snoring persons based on ROSE and C5.0 algorithm [J]. Computer Engineering and Applications, 2018, 54(3): 250-254.

Research on Filtering Algorithm for Senstive Information in Multi-form Uyghur

面向多形式维文的敏感信息过滤算法研究

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics