计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (19): 7-11.

• 热点与综述 • 上一篇    下一篇

基于密度的异常数据检测算法GSWCLOF

李少波1,2,孟  伟1,璩晶磊1   

  1. 1.中国科学院 成都计算机应用研究所,成都 610041
    2.贵州大学 机械工程学院,贵阳 550025
  • 出版日期:2016-10-01 发布日期:2016-11-18

GSWCLOF:density-based outlier detection algorithm on data stream

LI Shaobo1,2, MENG Wei1, QU Jinglei1   

  1. 1.Chengdu Institute of Computer Application, Chinese Academy of Sciences, Chengdu 610041, China
    2.School of Mechanical Engineering, Guizhou University, Guiyang 550025, China
  • Online:2016-10-01 Published:2016-11-18

摘要: 为改善有关数据流的异常数据检测方法中存在的检测准确度低和执行效率低等问题,根据数据挖掘技术理论,提出了一种新的基于密度的异常数据检测算法GSWCLOF。该算法引入滑动时间窗口和网格的理念,在滑动时间窗口内利用网格将数据细分,同时利用信息熵对所有网格内的数据进行剪枝和筛选,从而剔除绝大部分正常的数据,最后再利用离群因子对剩下的数据进行最终判断。实验结果表明,该算法有效地提高了检测准确度和执行效率。

关键词: 数据流检测, 滑动窗口, 网格, 信息熵, 离群因子

Abstract: To improve the inaccuracy and execution efficiency of outlier detection on data stream, a novel density-based outlier detection algorithm named GSWCLOF is proposed. By introducing the concepts of sliding time window and grid, the algorithm cuts a data stream into subsections of data; then after a pruning and filtering process by information entropy, the outliers in left data can be easily identified by local outlier factors. The experimental results finally show the advantages of this new algorithm in accuracy rating and execution efficiency.

Key words: data stream outlier detection, sliding window, grid, information entropy, local outlier factor