计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (18): 275-284.DOI: 10.3778/j.issn.1002-8331.2306-0022

• 网络、通信与安全 • 上一篇    下一篇

面向网络流量数据增强的生成对抗网络改进研究

张雅雯,张玉臣,吴越,李程   

  1. 中国人民解放军战略支援部队 信息工程大学 密码工程学院,郑州 450001
  • 出版日期:2024-09-15 发布日期:2024-09-13

Research on Improvement of Generation Adversarial Networks for Network Traffic Datasets Augmentation

ZHANG Yawen, ZHANG Yuchen, WU Yue, LI Cheng   

  1. Department of Cryptographic Engineering, Information Engineering University, Zhengzhou  450001, China
  • Online:2024-09-15 Published:2024-09-13

摘要: 网络流量数据的高维复杂特性,使得生成对抗网络生成的网络流量数据质量较差。为了解决该问题,提出一种基于双生成器的条件映射生成对抗网络(a cGAN with projection discriminator based on double generators,PD-DcGAN)并将其应用于少数类流量增强。提出基于Gumbel-sigmoid分布的离散生成器,获得近似于离散数据的光滑可导分布生成离散特征,并将其与连续数据生成器并联运行,二者结果串联组合,获得数据整体分布情况;以内积形式融合条件信息和特征信息,克服传统方法出现假设空间增大的问题,缓解模型训练过程中的不稳定现象;在损失函数中引入梯度惩罚因子,将判别器梯度限定在一定范围内,有效缓解梯度爆炸。利用UNSW-NB15数据集,从生成样本质量和模型有效性两个角度检验模型性能。实验结果证明,与其他数据增强方法相比,PD-DcGAN在准确率、精确率、召回率和F1得分上分别平均提高2.72%、1.72%、1.87%和1.16%;与原始数据集相比,对难以检测的Analysis、Backdoors、Exploits、Shellcode和Worms等少数类流量检测性能提升明显,分别从不足1%分别提升至7.93%、6.53%、15.72%、14.02%和10.91%。

关键词: 生成对抗网络, 双生成器结构, 数据增强, 不平衡数据集, 网络流量分类

Abstract: Due to the high-dimensional and complex characteristics of network traffic data, the quality of the network traffic data generated by the GAN is poor. In order to solve this problem, a cGAN with projection discriminator based on double generators(PD-DcGAN) is proposed and applied to minority class traffic enhancement. Firstly, a discrete generator based on Gumbel-sigmoid distribution is proposed, which obtains a smooth derivable distribution similar to discrete data for discrete feature learning, and runs in parallel with the continuous generator, the two results are combined in series to obtain the overall distribution of data.Then the conditional information and feature information are fused in the form of the inner product to overcome the problem of increasing hypothetical space in traditional methods and alleviate the instability in the process of model training. Finally, the gradient penalty factor is introduced into the loss function to limit the discriminator gradient to a certain range to effectively alleviate the gradient explosion. Using UNSW-NB15 dataset to verify the quality of the generated samples and the validity of the model. The experimental results show that compared with other data augmentation methods, the accuracy, precision, recall rate, and F1 score of the proposed method are improved by 2.72%, 1.72%, 1.87%, and 1.16% respectively. Compared with the original dataset, the performance of traffic detection for minority classes such as Analysis, Backdoors, Exploits, Shellcode, and Worms which are difficult to detect is significantly improved, from less than 1% to 7.93%, 6.53%, 15.72%, 14.02%, and 10.91% respectively.

Key words: generative adversarial network, double generator structure, data augmentation, imbalanced dataset, network traffic classification