计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (12): 132-136.DOI: 10.3778/j.issn.1002-8331.2003-0223

• 模式识别与人工智能 • 上一篇    下一篇

基于核密度波动的异常检测算法

张博文,刘智,桑国明   

  1. 大连海事大学 信息科学技术学院,辽宁 大连 116026
  • 出版日期:2021-06-15 发布日期:2021-06-10

Anomaly Detection Algorithm Based on Kernel Density Fluctuation

ZHANG Bowen, LIU Zhi, SANG Guoming   

  1. Information Science and Technology College, Dalian Maritime University, Dalian, Liaoning 116026, China
  • Online:2021-06-15 Published:2021-06-10

摘要:

异常检测是数据挖掘中的一个重要研究方向,当前大多数基于密度的异常检测算法常常基于样本分布假设,敏感于近邻参数[k]并且缺乏对集体异常点的检测能力。针对这些问题,提出了一种基于核密度估计的核密度波动算法。定义了可以综合评估数据点邻域内和邻域外核密度值波动的核密度波动因子,将其作为检测指标,并制定检测规则来识别异常点,这一指标可以综合考虑数据点的局部特征和全局特征,而且有助于发现集体异常。数据集上的实验结果表明,所提算法可以取得更好的检测结果,同时对算法参数具有相当的鲁棒性。

关键词: 数据挖掘, 异常检测, 核密度估计, 核密度波动, 敏感性分析

Abstract:

Anomaly detection is an important research direction in data mining. Most current density-based algorithms are often based on sample distribution assumptions, are sensitive to the nearest neighbor parameter [k], and cannot detect collective outliers. Aiming at these problems, a kernel density fluctuation algorithm based on kernel density estimation is proposed. The kernel density fluctuation factors that can comprehensively evaluate the fluctuations of nuclear density values within and outside the neighborhood are defined, and detection criteria are developed to identify outliers. This indicator can comprehensively consider the local and global characteristics of the data points, and at the same time help to find collective anomalies. The experimental results on the data set show that the proposed algorithm can achieve better detection results, and at the same time, it is quite robust to the algorithm parameters.

Key words: data mining, anomaly detection, kernel density estimation, kernel-density fluctuation, sensitivity analysis