面向网络入侵检测的GAN-SDAE-RF模型研究

doi:10.3778/j.issn.1002-8331.2007-0264

摘要/Abstract

摘要：

针对传统机器学习方法在处理不平衡的海量高维数据时罕见攻击类检测率低的问题，提出了一种基于深度学习的随机森林算法的入侵检测模型，为了避免传统的随机森林面对高维数据和不平衡数据时分类精度低、稳定性差和对罕见攻击类检测率低的问题，引入生成式对抗网络（GAN）和栈式降噪自编码器（SDAE）对随机森林算法（RF）进行改进。将罕见攻击类数据集输入GAN神经网络中，生成新的攻击类样本，改善网络入侵数据在样本集中不均衡分布的情况，通过堆叠深层的SDAE逐层抽取网络数据的分布规则，并结合各个编码层的系数惩罚和重构误差，来确定高维数据中与入侵行为相关的特征，基于降维后的特征数据构建森林决策树。采用UNSW-NB15数据集的实验结果表明，与SVM、KNN、CNN、LSTM、DBN方法相比，GAN-SDAE-RF整体检测准确率平均提高了9.39%、误报率和漏报率平均降低了9%和15.24%以及在少数类Analysis、Shellcode、Backdoor、Worms上检测率分别提高了26.8%、27.98%、27.85%、39.97%。

关键词: 深度学习, 生成式对抗网络, 栈式降噪自编码器, 随机森林算法

Abstract:

Aiming at the problem of low detection rate of rare attacks in traditional machine learning methods when dealing with unbalanced massive high-dimensional data, an intrusion detection model based on deep learning and random forest algorithm is proposed. In order to avoid the problems of low classification accuracy, poor stability and low detection rate of rare attacks when traditional random forests face high-dimensional data and unbalanced data, Generative Adversarial Network and Stacked Denoising Autoencoder are introduced into the Random Forest algorithm for improvement. The rare attack data set is input into the GAN neural network to generate a new attack sample to improve the uneven distribution of network intrusion data in the sample set. The deep-stacked SDAE extracts the distribution rules of the network data layer by layer, and combines the coefficient penalty and reconstruction error of each coding layer to determine the features related to the intrusion behavior in the high-dimensional data. The forest decision tree is constructed based on the characteristic data after dimension reduction. The experimental results using the UNSW-NB15 data set show that compared with SVM, KNN, CNN, LSTM, and DBN methods, the overall detection accuracy of GAN-SDAE-RF has increased by 9.39% on average, and the FPR and FNR have decreased by 9% and 15.24% on average. The detection rates on Shellcode, Backdoor, and Worms have increased by 26.8%, 27.98%, 27.85%, and 39.97% respectively.

Key words: deep learning, generative adversarial network, stacked denoising autoencoder, random forest

安磊，韩忠华，林硕，尚文利. 面向网络入侵检测的GAN-SDAE-RF模型研究[J]. 计算机工程与应用, 2021, 57(21): 155-164.

AN Lei, HAN Zhonghua, LIN Shuo, SHANG Wenli. Research on GAN-SDAE-RF Model for Network Intrusion Detection[J]. Computer Engineering and Applications, 2021, 57(21): 155-164.

参考文献

[1] 尚文利，尹隆，刘贤达，等.工业控制系统安全可信环境构建技术及应用[J].信息网络安全，2019，19（6）：1-10.
SHANG W L，YIN L，LIU X D，et al.Construction technology and application of industrial control system security and trusted environment[J].Netinfo Security，2019，19（6）：1-10.
[2] XIN Y，KONG L，LIU Z，et al.Machine learning and deep learning methods for cybersecurity[J].IEEE Access，2018，6：35365-35381.
[3] 尚文利，闫腾飞，赵剑明，等.工控通信行为的自编码特征降维和双轮廓模型异常检测方法[J].小型微型计算机系统，2018，39（7）：31-35.
SHANG W L，YAN T F，ZHAO J M，et al.Method of auto-encoder feature reduction and double-model anomaly detection on industrial control network behavior[J].Journal of Chinese Computer Systems，2018，39（7）：31-35.
[4] PARK J，CHEN J，CHO Y K，et al.CNN-based person detection using infrared images for night-time intrusion warning systems[J].Sensors，2019，20（1）：1-15.
[5] 徐少成.基于随机森林的高维不平衡数据分类方法研究[D].太原：太原理工大学，2018.
XU S C.Research on high dimensional imbalanced data classification based on random forest[D].Taiyuan：Taiyuan University of Technology，2018.
[6] 徐雪丽，段娟，肖创柏，等.基于CNN和SVM的报文入侵检测方法[J].计算机系统应用，2020，29（6）：39-46.
XU X L，DUAN J，XIAO C B，et al.Network packet intrusion detection method based on CNN and SVM[J].Computer Systems & Applications，2020，29（6）：39-46.
[7] 高忠石，苏旸，柳玉东.基于PCA-LSTM的入侵检测研究[J].计算机科学，2019，46（11A）：473-476.
GAO Z S，SU Y，LIU Y D.Study on intrusion detection based on PCA-LSTM[J].Computer Science，2019，46（11A）：473-476.
[8] 彭徵，王灵矫，郭华.基于随机森林的文本分类并行化[J].计算机科学，2018，45（12）：155-159.
PENG Z，WANG L J，GUO H.Parallel text categorization of random forest[J].Computer Science，2018，45（12）：155-159.
[9] 张家伟，郭林明，杨晓梅.针对不平衡数据的过采样和随机森林改进算法[J].计算机工程与应用，2020，56（11）：39-45.
ZHANG J W，GUO L M，YANG X M.Improved oversampling and random forest algorithm for imbalanced data[J].Computer Engineering and Applications，2020，56（11）：39-45.
[10] 徐凌伟，权天祺.基于BP神经网络的移动安全性能预测[J].聊城大学学报（自然科学版），2020，33（3）：34-40.
XU L W，QUAN T Q.Mobile secrecy performance prediction based on BP neural network[J].Journal of Liaocheng University（Social Science Edition），2020，33（3）：34-40.
[11] 何阳.针对工控系统入侵检测的对抗学习研究[D].杭州：浙江大学，2019.
HE Y.Generating adversarial examples against machine learning based intrusion detector in industrial control systems[D].Hanzhou：Zhejiang University，2019.
[12] 杨彦荣，宋荣杰，周兆永.基于GAN-PSO-ELM的网络入侵检测方法[J].计算机工程与应用，2020，56（12）：66-72.
YANG Y R，SONG R J，ZHOU Z Y.Network intrusion detection method based on GAN-PSO-ELM[J].Computer Engineering and Applications，2020，56（12）：66-72.
[13] 尚文利，张修乐，刘贤达，等.工控网络局域可信计算环境构建方法与验证[J].信息网络安全，2019，19（4）：1-10.
SHANG W L，ZHANG X L，LIU X D，et al.Construction method and verification of local trusted computing environment in industrial control network[J].NetInfo Security，2019，19（4）：1-10.
[14] LI D，DENG L，LEE M，et al.IoT data feature extraction and intrusion detection system for smart cities based on deep migration learning[J].International Journal of Information Management，2019，49（6）：533-545.
[15] 林子隆.基于深度学习的入侵检测算法研究与改进[D].上海：上海交通大学，2019.
LIN Z L.Research and improvement on intrusion detection algorithms based on deep learning[D].Shanghai：Shanghai Jiao Tong University，2019.
[16] 张玉清，董颖，柳彩云，等.深度学习应用于网络空间安全的现状、趋势与展望[J].计算机研究与发展，2018，55（6）：1117-1142.
ZHANG Y Q，DONG Y，LIU C Y，et al.Situation，trends and prospects of deep learning applied to cyberspace security[J].Journal of Computer Research and Development，2018，55（6）：1117-1142.
[17] 宋永强.基于栈式降噪自编码器降维的物联网分层入侵检测模型[D].兰州：兰州大学，2018.
SONG Y Q.IoT hierarchical intrusion detection model based on stacking denoising autoencoder and dimension reduction[J].Lanzhou：Lanzhou University，2018.
[18] 舒斐，陈涛，王斌，等.一种基于DBN-RF的电网工控系统异常识别方法[J].计算机工程，2020，46（11）：35-41.
SHU F，CHEN T，WANG B，et al.An anomaly identification method for power gridindustrial control system based on DBN-RF[J].Computer Engineering，2020，46（11）：35-41.
[19] 闫腾飞，尚文利，赵剑明，等.基于遗传算法优化的OCSVM双轮廓模型异常检测算法[J].计算机应用研究，2019，36（11）：3361-3364.
YAN T F，SHANG W L，ZHAO J M，et al.Anomaly detection algorithm based on OCSVM double contour model of genetic algorithm optimization for industrial control system[J].Application Research of Computers，2019，36（11）：3361-3364.
[20] 吴亚丽，李国婷，付玉龙.基于自适应鲁棒性的入侵检测模型[J].控制与决策，2019，34（11）：2330-2336.
WU Y L，LI G T，FU Y L.A new intrusion detection model based on adaptability and robustness[J].Control and Decision，2019，34（11）：2330-2336.
[21] 程超，陈梅，李治霖.基于置信规则库的工业控制网络入侵检测[J].网络安全技术与应用，2020，36（5）：37-39.
CHEN C，CHEN M，LI Z L.Industrial control network intrusion detection based on confidence rule base[J].Network Security Technology & Application，2020，36（5）：37-39.
[22] 何梦乙，覃仁超，刘建兰，等.基于Adam-BNDNN的网络入侵检测模型[J].计算机测量与控制，2020，28（2）：58-62.
HE M Y，TAN R C，LIU J L，et al.Network intrusion detection model based on Adam-BNDNN[J].Computer Measurement and Control，2020，28（2）：58-62.
[23] 朱世松，巴梦龙，王辉.基于NBSR模型的入侵检测技术[J].计算机工程与科学，2020，42（3）：427-433.
ZHU S S，BA M L，WANG H.An intrusion detection technology based on NBSR model[J].Computer Engineering and Science，2020，42（3）：427-433.
[24] 尚文利，杨路瑶，陈春雨，等.面向工业控制系统终端的轻量级组认证机制[J].信息与控制，2019，48（3）：344-353.
SHANG W L，YANG L Y，CHEN C Y，et al.Lightweight group authentication mechanism for industrial control system terminals[J].Information and Control，2019，48（3）：344-353.
[25] 刘凯.随机森林自适应特征选择和参数优化算法研究[D].长春：长春工业大学，2018.
LIU K.Research on adaptive feature selection and parameter optimization algorithm for random forest[D].Changchun：Changchun University of Technology，2018.