Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (4): 143-149.DOI: 10.3778/j.issn.1002-8331.2008-0354

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Study on Generative Adversarial Network for Data Anomaly Detection

ZHUANG Yuesheng, LIN Shanling, LIN Zhixian, ZHANG Yongai, GUO Tailiang   

  1. 1.College of Physics and Information Engineering, Fuzhou University, Fuzhou 350116, China
    2.Fujian Science & Technology Innovation Laboratory for Optoelectronic Information of China, Fuzhou 350116, China
    3.School of Advanced Manufacturing, Fuzhou University, Jinjiang, Fujian 362200, China
  • Online:2022-02-15 Published:2022-02-15

生成对抗网络在数据异常检测中的研究

庄跃生,林珊玲,林志贤,张永爱,郭太良   

  1. 1.福州大学 物理与信息工程学院,福州 350116
    2.中国福建光电信息科学与技术创新实验室,福州 350116
    3.福州大学 先进制造学院,福建 晋江 362200

Abstract: Many detection models cannot effectively detect because of the class-imbalanced data and the complexity of anomaly data, this paper proposes a novel model for data anomaly detection using generative adversarial network(GAN). The model first utilizes InfoGAN network to generate class-balanced samples, then builds an inference network that can be treated as label generator to predict realistic sample, the inference network is tuned using the second GAN which guarantees the consistency between the generated samples and the corresponding labels. The inference network is further optimized by adopting random forest to solve classification on generated data-label pair whose best hyperparameters are searched via Hyperband algorithm at last. The model is compared with five machine learning models and four real datasets, the result demonstrates that the proposed model can make effectively classification for anomaly data but need not collect more failure data, the model achieves 0.14 improvement from the K-nearest neighbor(KNN) model in terms of AUC value in the Mnist datasets, and outperforms any other traditional machine learning models.

Key words: data anomaly detection, InfoGAN, random forest, Hyperband

摘要: 针对许多检测模型受到数据不平衡和异常数据的复杂性等因素影响问题,提出一种以生成对抗网络(generative adversarial network,GAN)为基础的数据异常检测方法。该方法利用InfoGAN网络训练生成正常数据和异常数据,构造一个推理神经网络作为生成数据与原始数据的标签生成器,之后利用第二个GAN网络对推理网络精调,保证生成的样本和其标签对应;最后将生成样本与标签输入随机森林分类,通过Hyperband算法寻找随机森林最优超参,对推理网络进一步优化。在四个真实数据集上与五种传统机器学习模型进行实验对比,实验结果表明,该模型无需收集更多异常样本,达到数据平衡就可以有效进行数据异常检测。在Mnist数据集中,该模型的AUC值相比于K近邻(K-nearest neighbor,KNN)方法提高0.14,并且综合性能优于传统机器学习模型。

关键词: 数据异常检测, InfoGAN, 随机森林, Hyperband