针对恶意JavaScript识别的降维方法

doi:10.3778/j.issn.1002-8331.1808-0098

计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (21): 20-24.DOI: 10.3778/j.issn.1002-8331.1808-0098

针对恶意JavaScript识别的降维方法

刘鹏睿，宋礼鹏

中北大学大数据学院大数据与网络安全研究所，太原 030051

出版日期:2018-11-01 发布日期:2018-10-30

Using dimension reduction approach to identify malicious JavaScript

LIU Pengrui, SONG Lipeng

Research Institute of Big Data and Network Security, School of Big Data, North University of China, Taiyuan 030051, China

Online:2018-11-01 Published:2018-10-30

摘要/Abstract

摘要： 针对将JavaScript代码N-gram处理后识别算法特征维度较高的问题，提出一种高效的降维方法。该方法利用TF-IDF-like模型分别计算特征在正常样本和恶意样本中的权重，基于特征权重在两类样本中的差异度进行降维。基于多个识别算法，将提出的降维方法与基于主成分分析（Principal Component Analysis，PCA）的降维方法进行比较，实验结果表明：当识别算法维度相同时，基于本文所给降维方法的识别算法在识别效果方面优于基于PCA的识别算法；当降维后识别算法的维度超过某个阈值时，随着识别算法维度的增长，本降维方法的时间开销增长速率远低于PCA方法。

关键词: 降维, TF-IDF-like模型, 特征差异度, JavaScript, 主成分分析（PCA）

Abstract: This paper proposes an efficient dimension reduction method to avoid the curse of dimensionality caused by using the N-gram model to identify malicious JavaScript. The method uses the TF-IDF-like model to calculate respectively the weight of features in normal samples and malicious samples, and carries out dimension reduction based on the difference feature weight. Based on many recognition algorithms, this paper compares the proposed method with the dimension reduction method based on Principal Component Analysis（PCA）. The experimental results demonstrate two conclusions. Firstly, the recognition effect of the proposed method is better than that of PCA at the same feature dimension. Secondly, when the reserved dimension exceeds a certain threshold, with the increase of the reserved dimension, the growth rate of time cost is much lower than PCA.

Key words: dimension reduction, TF-IDF-like model, different weight of features, JavaScript, Principal Component Analysis（PCA）

刘鹏睿，宋礼鹏. 针对恶意JavaScript识别的降维方法[J]. 计算机工程与应用, 2018, 54(21): 20-24.

LIU Pengrui, SONG Lipeng. Using dimension reduction approach to identify malicious JavaScript[J]. Computer Engineering and Applications, 2018, 54(21): 20-24.

[1]	于多，黄永东. 基于SPCA和域变换递归滤波的高光谱图像分类[J]. 计算机工程与应用, 2021, 57(4): 199-208.
[2]	王义武，杨余旺. 空间投影在K-means算法中的研究与应用[J]. 计算机工程与应用, 2020, 56(7): 200-204.
[3]	韩嵩，韩秋弘. 半监督学习研究的述评[J]. 计算机工程与应用, 2020, 56(6): 19-27.
[4]	魏世超，李歆，张宜弛，周晓锋，李帅. 基于E-t-SNE的混合属性数据降维可视化方法[J]. 计算机工程与应用, 2020, 56(6): 66-72.
[5]	林克正，张元铭，李昊天. 信息熵加权的HOG特征提取算法研究[J]. 计算机工程与应用, 2020, 56(6): 147-152.
[6]	邱建荣，罗汉. 改进的局部线性嵌入算法及其应用[J]. 计算机工程与应用, 2020, 56(3): 176-179.
[7]	展鹏，陈琳，曹鲁慧，许浩然，李学庆. 核转折点裁剪表示的时间序列异常检测算法[J]. 计算机工程与应用, 2020, 56(23): 130-138.
[8]	黄广俊，邓元龙. 融合改进LBP和SVM的偏光片外观缺陷检测与分类[J]. 计算机工程与应用, 2020, 56(22): 251-255.
[9]	谢心蕊，雷秀仁，赵岩. MI和改进PCA的降维算法在股价预测中的应用[J]. 计算机工程与应用, 2020, 56(21): 139-144.
[10]	赵童，黄钲，王秀超，李淼，张昀，郑秀娟，刘凯. 心理测试中掩饰行为的识别研究[J]. 计算机工程与应用, 2020, 56(20): 158-164.
[11]	黄欣，莫海淼，赵志刚，曾敏. 离散型增强烟花算法和[kNN]在特征选择中的研究[J]. 计算机工程与应用, 2020, 56(16): 112-117.
[12]	黄冬梅，梁素玲，王振华，孙婧琦，徐首珏. 利用信息熵的高光谱遥感影像降维方法[J]. 计算机工程与应用, 2019, 55(6): 191-196.
[13]	李翼宏，杜镇宇，胡劲松. APT样本的有效网络特征筛选算法[J]. 计算机工程与应用, 2019, 55(3): 83-89.
[14]	徐竟泽，吴作宏，徐岩，曾建行. 融合PCA、LDA和SVM算法的人脸识别[J]. 计算机工程与应用, 2019, 55(18): 34-37.
[15]	龙廷艳，万良，邓烜堃. 基于卷积神经网络的JavaScript恶意代码检测方法[J]. 计算机工程与应用, 2019, 55(18): 89-94.

针对恶意JavaScript识别的降维方法

Using dimension reduction approach to identify malicious JavaScript

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics