计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (16): 125-133.DOI: 10.3778/j.issn.1002-8331.2005-0212

• 网络、通信与安全 • 上一篇    下一篇

基于近邻成分分析的WebShell特征处理算法研究

周爱君,努尔布力,艾壮,肖中正   

  1. 1.新疆大学 信息科学与工程学院,乌鲁木齐 830046
    2.新疆大学 网络中心,乌鲁木齐 830046
  • 出版日期:2021-08-15 发布日期:2021-08-16

Research on WebShell Feature Processing Algorithm Based on Neighborhood Component Analysis

ZHOU Aijun, NURBOL, AI Zhuang, XIAO Zhongzheng   

  1. 1.College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
    2.Network Centre, Xinjiang University, Urumqi 830046, China
  • Online:2021-08-15 Published:2021-08-16

摘要:

为解决WebShell样本在文本向量化处理中出现的“维度灾难”和检测效果差的问题,提出了基于近邻成分分析(Neighborhood Component Analysis,NCA)的WebShell特征处理算法。算法通过NCA自动化学习投影矩阵,在保留全局信息的同时完成高维特征空间的约减,为避免过于依赖总体训练样本,采用ReliefF特征选择方法从局部信息的角度进一步优化特征处理,提高WebShell模型检测性能。实验表明,基于近邻成分分析的WebShell特征处理方法能有效检测WebShell,并在准确率、召回率上优于大多数传统特征处理算法的WebShell检测模型。

关键词: WebShell, 特征处理, 近邻成分分析, ReliefF算法

Abstract:

In order to solve the problems of “dimensional disaster” and  poor detection of WebShell samples in text vectorization processing, the paper proposes a WebShell feature processing algorithm based on Nearest Neighborhood Component Analysis(NCA). This method automatically learns the projection matrix through NCA, and completes the reduction of the high-dimensional feature space while retaining the global information. In order to avoid relying too much on the overall training sample, the ReliefF feature selection method is used to further optimize feature processing from the perspective of local information and improve WebShell model checking performance. Experiments show that the WebShell feature processing method based on nearest neighborhood component analysis can effectively detect WebShell, and is superior to the WebShell detection model of most traditional feature processing algorithms in accuracy and recall rate.

Key words: WebShell, feature processing, neighborhood component analysis, ReliefF algorithm