计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (9): 102-115.DOI: 10.3778/j.issn.1002-8331.2405-0334

• 理论与研发 • 上一篇    下一篇

基于Kolmogorov不等式的数据流漂移检测方法

韩萌,孟凡兴,李春鹏,张瑞华,何菲菲,丁剑   

  1. 1.北方民族大学 计算机科学与工程学院,银川 750021
    2.图像图形智能处理国家民委重点实验室(北方民族大学),银川 750021
  • 出版日期:2025-05-01 发布日期:2025-04-30

Kolmogorov Inequality Based Drift Detection Methods for Data Stream

HAN Meng, MENG Fanxing, LI Chunpeng, ZHANG Ruihua, HE Feifei, DING Jian   

  1. 1.School of Computer Science and Engineering, North Minzu University, Yinchuan 750021,China
    2.The Key Laboratory of Images and Graphics Intelligent Processing of State Ethnic Affairs Commission:IGIPLab(North Minzu University), Yinchuan 750021,China
  • Online:2025-05-01 Published:2025-04-30

摘要: 在现实数据环境中,数据分布经常随着时间推移而变化,该现象称为概念漂移。概念漂移会显著影响原分类模型的性能。因此,当概念漂移出现时,分类模型需及时调整以适应数据分布变化,从而保证学习的有效性。探讨了Kolmogorov不等式在概念漂移检测领域的应用潜力。提出了一种基于错误率的Kolmogorov漂移检验策略,利用Kolmogorov不等式设计了概念漂移检测方法,并利用该算法来检测数据流中突然或逐渐出现的概念漂移。提出了一种尾部实例调整策略,减轻了漂移检测样本集中旧实例的影响,从而进一步降低了漂移检测延迟。实验表明,与经典或先进的漂移检测器相比,提出的算法在分类准确率方面表现最佳。在漂移检测性能方面,提出的算法在误检率和检测延迟方面的表现均位于前列,达到了较好的平衡。在运行时间方面也表现出了良好的性能。在上述四个指标的总体比较中优于其他算法,达到了该研究的预期。

关键词: 概念漂移, 漂移检测, 数据流, 分类, Kolmogorov不等式

Abstract: In real-world data environments, data distributions often change over time, a phenomenon known as concept drift. Concept drift can significantly impact the performance of the original classification model. Therefore, when concept drift occurs, classification models need to make timely adjustments to adapt to changes in data distribution to ensure effective learning. This paper explores the potential of using the Kolmogorov inequality to detect concept drift. A Kolmogorov drift detection strategy based on error rate is proposed , a concept drift detection method is designed utilizing the Kolmogorov inequality, and the concept drift is detected abruptly or gradually in data streams. Additionally, this paper introduces a tail instance adjustment strategy, which partially mitigates the impact of old instances in the drift detection sample set, thereby further reducing drift detection delay. Experimental validation shows that compared to classical or state-of-the-art drift detectors, the proposed algorithms exhibit optimal performance in terms of classification accuracy. In terms of drift detection performance, proposed algorithms rank among the top in both false detection rate and detection delay, achieving a good balance. It demonstrates good performance in terms of runtime. Finally, proposed algorithms outperform others in overall comparison across these four metrics, meeting the expectations.

Key words: concept drift, drift detection, data stream, classification, Kolmogorov inequality