计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (7): 107-114.DOI: 10.3778/j.issn.1002-8331.2003-0430

• 网络、通信与安全 • 上一篇    下一篇

基于混合神经网络的恶意TLS流量识别研究

韦佶宏,郑荣锋,刘嘉勇   

  1. 1.四川大学 网络空间安全学院,成都 610065
    2.四川大学 电子信息学院,成都 610065
  • 出版日期:2021-04-01 发布日期:2021-04-02

Research on Malicious TLS Traffic Identification Based on Hybrid Neural Network

WEI Jihong, ZHENG Rongfeng, LIU Jiayong   

  1. 1.College of Cybersecurity, Sichuan University, Chengdu 610065, China
    2.College of Electronics and Information Engineering, Sichuan University, Chengdu 610065, China
  • Online:2021-04-01 Published:2021-04-02

摘要:

针对使用传统机器学习方法来识别恶意TLS流量受到专家经验的影响较大、识别与分类效果不理想的问题,提出了HNNIM(Hybrid Neural Network Identification Model)模型来进行识别与分类。模型由两层组成:第一层用于提取特征,第二层用于识别与分类。第一层中,提取的特征分为两部分,一部分特征由深度神经网络自动挖掘,另一部分特征根据专家经验选取,并由深度神经网络进一步筛选;第二层将第一层筛选出的特征进行聚合,采用全连接的深度神经网络进一步学习和拟合。通过分析大量TLS流量样本,最终选用TLS流量中的ClientHello与ServerHello消息报文与TCP协议交互信息这两部分来作为特征空间。实验的结果表明,HNNIM模型在恶意TLS流量的识别任务上关于恶意样本的F1值为0.989,较随机森林、SVM、XGBoost、卷积神经网络模型,在F1值上分别提升了0.016、0.016、0.019、0.043;在多分类任务上的平均准确率为89.28%,较随机森林、SVM、XGBoost、卷积神经网络模型分别提升了9.92%、9.09%、11.31%、7.03%。

关键词: TLS流量识别, 恶意加密流量, 传统机器学习, 深度神经网络, 特征自动挖掘

Abstract:

To address the problem that using traditional machine learning methods to identify malicious TLS traffic is greatly affected by expert experience, and the identification and classification results are not satisfactory, a Hybrid Neural Network Identification Model(HNNIM) for identification and classification is proposed. The model consists of two layers, the first layer is used to extract features and the second layer is used for identification and classification. In the first layer, the final extracted features are composed of two parts:one part is automatically mined by deep neural network; the other part is selected according to expert experience and further screened by the deep neural network. The second layer aggregates the features screened from the first layer, using a fully connected deep neural network for further learning and fitting. By analyzing a large number of TLS traffic samples, the ClientHello and ServerHello message and TCP protocol interactions information in TLS traffic are selected as the feature space. The experimental results show that the F1 value of HNNIM regarding malicious samples on the malicious TLS traffic identification task is 0.989, which is 0.016, 0.016, 0.019, 0.043 higher than the random forest, SVM, XGBoost, Convolutional Neural Network models, respectively; the average accuracy on the multi-classification task is 89.28%, which is 9.92%, 9.09%, 11.31%, 7.03% higher than the random forest, SVM, XGBoost, Convolutional Neural Network models.

Key words: TLS traffic identification, malicious encryption traffic, traditional machine learning, deep neural network, automatic feature mining