计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (17): 93-99.DOI: 10.3778/j.issn.1002-8331.1908-0298

• 大数据与云计算 • 上一篇    下一篇

深度自编码器在数据异常检测中的应用研究

张常华,周雄图,张永爱,姚剑敏,郭太良,严群   

  1. 1.福州大学 物理与信息工程学院,福州 350108
    2.博感电子科技有限公司,福建 晋江 362200
  • 出版日期:2020-09-01 发布日期:2020-08-31

Application Research of Deep Auto Encoder in Data Anomaly Detection

ZHANG Changhua, ZHOU Xiongtu, ZHANG Yong’ai, YAO Jianmin, GUO Tailiang, YAN Qun   

  1. 1.College of Physics and Information Engineering, Fuzhou University, Fuzhou 350108, China
    2.RichSense Electronic Technology Co., Ltd., Jinjiang, Fujian 362200, China
  • Online:2020-09-01 Published:2020-08-31

摘要:

针对自编码器网络(AE)需要正常数据进行训练的局限性,结合主成分分析方法,将AE的每次重建输出与输入数据进行求差,隔离出异常数据部分,即将输入数据分为正常与异常部分,正常部分由AE重建输出,异常部分由近端法进行优化输出,最后采用交替方向乘子法训练整个模型并达到预定训练次数再输出结果,实现了一种基于深度自编码网络(DAE)模型的无监督数据异常检测方法。在7个真实数据集与8种机器学习模型和AE模型进行了对比实验,结果表明,DAE模型无需输入正常数据就可以有效进行模型训练,且可以防止模型的过拟合,其综合表现高于传统机器学习模型和AE模型,AUC值在4个数据集中达到最优。在mnist数据集中,DAE模型的AUC值相比于孤立森林(IF)方法提高了10.93%。

关键词: 数据异常检测, 自编码网络, 深度自编码网络, 曲线下面积(AUC)

Abstract:

Normal data for training is usually required in Auto Encoder(AE) network, which limits its applications in data anomaly detection. This paper proposes an unsupervised data anomaly detection method based on a Deep Auto Encoder(DAE) network model. In this model, Principal Components Analysis(PCA) is introduced, and the anomaly data is isolated by differencing each reconstruction output of AE and the input data. That is, the input data is divided into normal data and anomaly data, where the normal data is reconstructed via the AE network, and the anomaly data is optimized before outputting. Finally, the whole model is trained by the Alternating Direction Method of Multipliers(ADMM), and the results are outputted when the predetermined number of training times is sucessfully achieved. The DAE model is compared with eight machine learning models and AE model in seven real datasets. The results show that the DAE model can effectively carry out model training without inputting normal data and prevent model from overfitting, and the overall performances are better than those using the traditional machine learning model and AE model. The AUC values of DAE model are optimal in 4 datasets, among which, the AUC value obtained from the DAE model is 10.93% higher than that from the Isolated Forest(IF) method in the mnist datasets.

Key words: data anomaly detection, auto encoder network, Deep Auto Encoder network(DAE), Area Under the Curve(AUC)