计算机工程与应用 ›› 2017, Vol. 53 ›› Issue (3): 58-63.DOI: 10.3778/j.issn.1002-8331.1607-0236

• 理论与研发 • 上一篇    下一篇

基于K-means的数据流离群点检测算法

韩  崇1,袁颖珊2,梅  焘2,耿慧玲2   

  1. 1.南京邮电大学 计算机学院,南京 210003
    2.南京邮电大学 通达学院,江苏 扬州 225127
  • 出版日期:2017-02-01 发布日期:2017-05-11

Data stream outlier detection algorithm based on K-means

HAN Chong1, YUAN Yingshan2, MEI Tao2, GENG Huiling2   

  1. 1. College of Computer, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
    2. College of Tongda, Nanjing University of Posts and Telecommunications, Yangzhou, Jiangsu 225127, China
  • Online:2017-02-01 Published:2017-05-11

摘要: 针对数据流中离群点挖掘问题,在K-means聚类算法基础上,提出了基于距离的准则进行数据间离群点判断的离群点检测DOKM算法。根据数据流概念漂移检测结果来自适应地调整滑动窗口大小,从而实现对数据流的离群点检测,与其他离群点算法的一系列实验验证和对比结果表明,DOKM算法在人工数据集和真实数据集中均可以实现对离群点的有效检测。

关键词: 概念漂移, 数据流, K-means聚类, 可变滑动窗口, 离群点检测

Abstract: Aiming at the problem of outlier mining in data streams, an outlier detection algorithm, DOKM algorithm, based on the K-means clustering algorithm, with distance-based criterion to evaluate the issues of outliers is proposed in this paper. For making the detection result of the concept drift of data stream it automatically adjusts the sliding window size and detects outliers in the data stream. A set of experiments show that the proposed DOKM algorithm can achieve the effective detection of outliers in both an artificial data set and a real data set compared with other outlier detection algorithms.

Key words: concept drift, data stream, K-means clustering, variable sliding window, outlier detection