基于迁移学习的加密恶意流量检测方法

doi:10.3778/j.issn.1002-8331.2106-0143

摘要/Abstract

摘要： 现有加密恶意流量检测方法需要利用大量准确标记的样本进行训练，以达到较好的检测效果。但在实际网络环境中，加密流量数据由于其内容不可见而难以进行正确标记。针对上述问题，提出了一种基于迁移学习的加密恶意流量检测方法，首次将基于ImageNet数据集预训练的模型Efficientnet-B0，迁移到加密流量数据集上，保留其卷积层结构和参数，对全连接层进行替换和再训练，利用迁移学习的思想实现小样本条件下的高性能检测。该方法利用端到端的框架设计，能够直接从原始流量数据中提取特征并进行检测和细粒度分类，避免了繁杂的手动特征提取过程。实验结果表明，该方法对正常、恶意流量的二分类准确率能够达到99.87%，加密恶意流量细粒度分类准确率可达到98.88%，并且在训练集中各类流量样本数量减少到100条时，也能够达到96.35%的细粒度分类准确率。

关键词: 加密恶意流量检测, 迁移学习, Efficientnet, 小样本, 加密流量

Abstract: The existing encryption malicious traffic detection methods need to use a large number of accurately marked samples for training, to achieve a better detection effect. But in the real network environment, it is difficult to mark the encrypted traffic data correctly because its content is not visible. In view of the above problems, an encrypted malicious traffic detection method based on tranfer learning is proposed. The Eficientnet-B0, a pre-trained model based on the Imagenet dataset, is transferred to the encrypted traffic dataset for the first time. Its convolution layer structure and parameters are preserved, and the fully connected layers are replaced and retrained. By the idea of migration learning, the high detection performance under small sample condition is realized. Utilizing the end-to-end framework design, this method can extract the features from the original traffic data directly, then detect and classify them in fine-grained way, which avoids the complicated manual feature extraction process. The experimental results show that this method can achieve 99.87% binary classification accuracy and 98.88% fine-grained classification accuracy. Furthermore, when the number of various traffic samples in the training set is reduced to 100, it can also reach 96.35% of fine-grained classification accuracy.

Key words: encrypted malicious traffic detection, transfer learning, Efficientnet, few-shot, encrypted traffic

张稣荣, 陈博, 卜佑军, 路祥雨, 孙嘉. 基于迁移学习的加密恶意流量检测方法[J]. 计算机工程与应用, 2022, 58(17): 130-138.

ZHANG Surong, CHEN Bo, BU Youjun, LU Xiangyu, SUN Jia. Encrypted Malicious Traffic Detection Method Based on Transfer Learning[J]. Computer Engineering and Applications, 2022, 58(17): 130-138.

参考文献

[1] 张蕾，崔勇，刘静，等.机器学习在网络空间安全研究中的应用[J].计算机学报，2018，41（9）：1943-1975.
ZHANG L，CUI Y，LIU J，et al.Application of machine learning in cyberspace security research[J].Chinese Journal of Computers，2018，41（9）：1943-1975.
[2] BARAC White Paper[EB/OL].[2021-04-10].http：//barac.io/white_paper_encrypted_traffic/.
[3] CHO K，VAN M B，GULCEHRE C，et al.Learning phrase representations using RNN encoder-decoder for statistical machine translation[J].arXiv：1406.1078，2014.
[4] TAN M，Le Q.Efficientnet：rethinking model scaling for convolutional neural networks[C]//International Conference on Machine Learning，2019：6105-6114.
[5] KIM S M，GOO Y H，KIM M S，et al.A method for service identification of SSL/TLS encrypted traffic with the relation of session ID and Server IP[C]//2015 17th Asia-Pacific Network Operations and Management Symposium（APNOMS），2015：487-490.
[6] SHBAIR W M，CHOLEZ T，FRAN?OIS J，et al.Improving SNI-based HTTPs security monitoring[C]//2016 IEEE 36th International Conference on Distributed Computing Systems Workshops（ICDCSW），2016：72-77.
[7] MARTIN H，MILAN ?，TOMá? J，et al.HT-TPS traffic analysis and client identification using passive SSL/TLS fingerprinting[J].Eurasip Journal on Information Security，2016（1）：30.
[8] PAPADOGIANNAKI E，HALEVIDIS C，AK-RITIDIS P，et al.OTTer：a scalable high-resolution encrypted traffic identification engine[C]//International Symposium on Research in Attacks，Intrusions，and Defenses，2018：315-334.
[9] 胡斌，周志洪，姚立红，等.结合报文负载与流指纹特征的恶意流量检测[J].计算机工程，2020，46（11）：157-163.
HU B，ZHOU Z H，YAO L H，et al.Malicious traffic detection combining features of packet payload and stream fingerprint[J].Computer Engineering，2020，46（11）：157-163.
[10] ANDERSON B，MCGREW D.Identifying encrypted malware traffic with contextual flow data[C]//Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security，2016：35-46.
[11] ANDERSON B，MCGREW D.Machine learning for encrypted malware traffic classification：accounting for noisy labels and non-stationarity[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining，2017：1723-1732.
[12] YAN F P，XU M，QIAO T，et al.Identifying wechat red packets and fund transfers via analyzing encrypted network traffic[C]//2018 17th IEEE International Conference on Trust，Security and Privacy in Computing and Communications/12th IEEE International Conference on Big Data Science and Engineering（TrustCom/BigDataSE），2018：1426-1432.
[13] RONG C，GOU G，CUI M，et al.MalFinder：an ensemble learning-based framework for malicious traffic detection[C]//2020 IEEE Symposium on Computers and Communications（ISCC），2020.
[14] WANG S，CHEN Z，YAN Q，et al.A mobile malware detection method using behavior features in network traffic[J].Journal of Network and Computer Applications，2019，133：15-25.
[15] YAO Z，GE J，WU Y，et al.Encrypted traffic classification based on Gaussian mixture models and hidden Markov models[J].Journal of Network and Computer Applications，2020，166：102711.
[16] WANG W，ZHU M，WANG J，et al.End-to-end encrypted traffic classification with one-dimensional convolution neural networks[C]//2017 IEEE International Conference on Intelligence and Security Informatics（ISI），2017：43-48.
[17] WANG W，ZHU M，ZENG X，et al.Malware traffic classification using convolutional neural network for representation learning[C]//2017 International Conference on Information Networking（ICOIN），2017：712-717.
[18] 吴迪，方滨兴，崔翔，等.BotCatcher：基于深度学习的僵尸网络检测系统[J].通信学报，2018，39（8）：18-28.
WU D，FANG B X，CUI X，et al.BotCatcher：botnet detection system based on deep learning[J].Journal on Communications，2018，39（8）：18-28.
[19] 韦佶宏，郑荣锋，刘嘉勇.基于混合神经网络的恶意TLS流量识别研究[J].计算机工程与应用，2021，57（7）：107-114.
WEI J H，ZHENG R F，LIU J Y.Research on malicious TLS traffic identification based on hybrid neural network[J].Computer Engineering and Applications，2021，57（7）：107-114.
[20] 黎佳玥，赵波，李想，等.基于深度学习的网络流量异常预测方法[J].计算机工程与应用，2020，56（6）：39-50.
LI J Y，ZHAO B，LI X，et al.Network traffic anomaly prediction method based on deep learning[J].Computer Engineering and Applications，2020，56（6）：39-50.
[21] PAN S J，YANG Q.A survey on transfer learning[J].IEEE Transactions on Knowledge and Data Engineering，2010，22（10）：1345-1359.
[22] REZAEI S，LIU X.How to achieve high classification accuracy with just a few labels：a semi-supervised approach using sampled packets[J].arXiv：1812.09761，2018.
[23] LI Q，JU Y，ZHAO C.Classification of discrete sequential protocol messages based on LSTM network and transfer learning[C]//2020 5th International Conference on Computer and Communication Systems（ICCCS），2020：424-430.
[24] LIN M，CHEN Q，YAN S.Network in network[J].arXiv：1312.4400，2013.
[25] SANDLER M，HOWARD A，ZHU M，et al.Mobilenetv2：inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：4510-4520.
[26] TAN M，CHEN B，PANG R，et al.Mnasnet：platform-aware neural architecture search for mobile[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：2820-2828.
[27] HU J，SHEN L，SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：7132-7141.
[28] 翟明芳，张兴明，赵博.基于深度学习的加密恶意流量检测研究[J].网络与信息安全学报，2020，6（3）：59-70.
ZHAI M F，ZHANG X M，ZHAO B.Survey of encrypted malicious traffic detection based on deep learning[J].Chinese Journal of Network and Information Security，2020，6（3）：59-70.
[29] ISCX UNB.VPN-nonVPN dataset[EB/OL].[2021-04-10].http：//www.unb.ca/cic/research/datasets/vpn.html.
[30] CTU University.The stratosphere IPS project dataset[EB/OL].[2021?04?10].https：//stratosphereips.org/category/dataset.html.