Computer Engineering and Applications ›› 2020, Vol. 56 ›› Issue (17): 78-85.DOI: 10.3778/j.issn.1002-8331.1908-0188

Previous Articles     Next Articles

Anomaly Detection Method Based on Multi-resolution Grid

LIU Wenfen, MU Xiaodong, HUANG Yuehua   

  1. 1.Guangxi Key Laboratory of Cryptography and Information Security, Guilin University of Electronic Technology, Guilin, Guangxi 541004, China
    2.College of Computer Science and Engineering, Guilin University of Aerospace Technology, Guilin, Guangxi 541004, China
  • Online:2020-09-01 Published:2020-08-31

基于多分辨率网格的异常检测方法

刘文芬,穆晓东,黄月华   

  1. 1.桂林电子科技大学 广西密码学与信息安全重点实验室,广西 桂林 541004
    2.桂林航天工业学院 计算机科学与工程学院,广西 桂林 541004

Abstract:

As an important means of data mining, anomaly detection is widely used in the field of data analysis. However, existing anomaly detection algorithms often need to adjust different parameters for different data to achieve the corresponding detection effect. In the face of big data, the detection time efficiency of existing algorithms is not satisfactory. The anomaly detection technology based on grid can well solve the problem of time efficiency of low-dimensional data anomaly detection. However, the detection accuracy depends heavily on the grid partition scale and density threshold parameters, which have poor robustness and cannot be well extended to different types of data sets. Based on the above problems, the proposed method firstly introduces a submatrix partition parameter with good robustness, divides high-dimensional data into several low-dimensional subspaces, and makes the anomaly detection algorithm carry out on the subspaces, so as to ensure the applicability of high-dimensional data. Then, an anomaly detection algorithm based on multi-resolution grid is proposed. Through the multi-resolution grid division from sparse to dense, the local anomaly factors of data points in different scale grids are comprehensively weighed, and the final output is the score ranking of global outliers. Experimental results show that the newly introduced submatrix partition parameters have good robustness, and the method can adapt to high-dimensional data well, and can get good detection effect on multiple public data sets, providing an efficient solution for solving the problems related to anomaly detection of high-dimensional data.

Key words: anomaly detection, multi-resolution grid, high dimensional data, subspace, data mining

摘要:

作为一种重要的数据挖掘手段,异常检测在数据分析领域有着广泛的应用。然而现有的异常检测算法针对不同的数据,往往需要调整不同的参数才能达到相应的检测效果,在面对大型数据时,现有算法检测的时间效率也不尽如人意。基于网格的异常检测技术,可以很好地解决低维数据异常检测的时间效率问题,然而检测精度严重依赖于网格的划分尺度和密度阈值参数,该参数鲁棒性较差,不能很好地推广到不同类型数据集上。基于上述问题,提出了一种基于多分辨率网格的异常检测方法,该方法引入一个鲁棒性较好的子矩阵划分参数,将高维数据划分到多个低维的子空间,使异常检测算法在子空间上进行,从而保证了高维数据的适用性;通过从稀疏到密集的多分辨率网格划分,综合权衡了数据点在不同尺度网格下的局部异常因子,最终输出全局异常值的得分排序。实验结果表明,新引入的子矩阵划分参数具有较好的鲁棒性,该方法能较好地适应高维数据,并在多个公开数据集上都能得到良好的检测效果,为解决高维数据异常检测的相关问题提供了一种高效的解决方案。

关键词: 异常检测, 多分辨率网格, 高维数据, 子空间, 数据挖掘