计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (17): 89-94.DOI: 10.3778/j.issn.1002-8331.1806-0278

• 大数据与云计算 • 上一篇    下一篇

基于网格查询的局部离群点检测算法

牛少章,欧毓毅,凌捷,顾国生   

  1. 广东工业大学 计算机学院,广州 510006
  • 出版日期:2019-09-01 发布日期:2019-08-30

Local Outlier Detection Algorithm Based on Grid Query

NIU Shaozhang, OU Yuyi, LING Jie, GU Guosheng   

  1. Faculty of Computer, Guangdong University of Technology, Guangzhou 510006, China
  • Online:2019-09-01 Published:2019-08-30

摘要: 针对基于密度的局部离群因子算法(LOF),需要计算距离矩阵来进行[k]近邻查寻,算法时间复杂度高,不适合大规模数据集检测的问题,提出基于网格查询的局部离群点检测算法。算法利用距离目标网格中的数据点最近的[k]个其他数据点,一定在该目标网格或在该目标网格的最近邻接网格中这一特性,来改进LOF算法的邻域查询操作,以此减少LOF算法在邻域查询时的计算量。实验结果证明,提出的LOGD算法在与原LOF算法具有基本相同的检测准确率的情况下,能够有效地降低离群点检测的时间。

关键词: 局部离群因子, [k]近邻, 距离矩阵, 网格, 记忆性, 邻域查询

Abstract: In view of the density based Local Outlier Factor(LOF) algorithm, it is necessary to calculate the distance matrix for the [k] nearest neighbor search, but the time complexity of this algorithm is high, and it is not suitable for the large-scale data set detection. The local outlier detection algorithm based on the grid query(LOGD) is proposed. This algorithm uses the characteristic that the nearest [k] data points from the data points in the target grid, which must be memorable in the target grid or the nearest adjacent grid of the target grid, to improve the neighborhood query operation of the LOF algorithm, and reduce the distance calculation from the neighborhood query. Experimental results show that the algorithm has the same detection accuracy as the original LOF algorithm, and effectively reduces the time of outlier detection.

Key words: local outlier factor, [k]-nearest neighbors, distance matrix, grid, memory, neighborhood query