Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (1): 89-98.DOI: 10.3778/j.issn.1002-8331.2012-0352

• Theory, Research and Development • Previous Articles     Next Articles

Two-Stage Outlier Detection Algorithm Based on Markov Random Walk

XI Tingting, ZHAO Xujun, SU Jianhua   

  1. School of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024, China
  • Online:2022-01-01 Published:2022-01-06

基于马尔科夫随机游走的两阶段离群检测算法

席婷婷,赵旭俊,苏建花   

  1. 太原科技大学 计算机科学与技术学院,太原 030024

Abstract: In the neighborhood-based outlier detection algorithm, the selection and determination of parameters is an important issue. Unreasonable parameter selection leads to a significant decrease in the performance of the algorithm. In order to reduce the influence of parameters on outlier detection, a two-stage outlier detection algorithm based on Markov random wandering is proposed, which can effectively reduce the influence of parameters on detection results without affecting the efficiency of the algorithm. The algorithm firstly conducts a uniform sampling strategy to generate a series of triangulation graphs, and the removal rules are introduced to get the topology structure of the node, so that the algorithm obtains a transition probability matrix defined by the node connectivity, which reduces the computational complexity and running time of the algorithm. Secondly, the weighted voting principle is used to redefine the restart vector, and the average deviation of the smoothly distributed vector on different graphs is used as the outlier score, which effectively improves the accuracy of the algorithm. Finally, the synthetic data set and UCI data set are used to verify that the algorithm has a higher accuracy rate than the existing algorithm.

Key words: outlier detection, DLS-delaunary, Markov random walk

摘要: 基于邻域的离群点检测算法中,参数的选择与确定是一个重要的问题,不合理的参数选择导致算法的性能显著下降。为减少参数对于离群点检测的影响,提出了一种基于马尔科夫随机游走的两阶段离群检测算法,可以在不影响算法效率的基础上,有效降低参数对检测结果的影响。该算法采用均匀采样策略生成一系列三角剖分图,并引入移除规则得到节点的拓扑结构,从而获得由节点连通性定义的转移概率矩阵,有效减少了算法的计算量和运行时间;其采用加权投票原则重新定义重启向量,并将不同图上得到的平稳分布向量的平均偏差值作为离群点分数,有效地提高了算法的准确性。采用合成数据集以及UCI数据集,验证了该算法与现有的算法相比有更高的准确率。

关键词: 离群点检测, DLS-三角剖分, 马尔科夫随机游走