计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (13): 124-129.DOI: 10.3778/j.issn.1002-8331.2003-0329

• 模式识别与人工智能 • 上一篇    下一篇

面向动态数据块的非平衡数据流分类算法

王俊红,郭亚慧   

  1. 1.山西大学 计算机与信息技术学院,太原 030006
    2.计算智能与中文信息处理教育部重点实验室,太原 030006
  • 出版日期:2021-07-01 发布日期:2021-06-29

Imbalanced Data Stream Classification Algorithm for Dynamic Data Chunk

WANG Junhong, GUO Yahui   

  1. 1.School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China
    2.Key Laboratory of Computational Intelligence and Chinese Information Processing, Ministry of Education, Taiyuan 030006, China
  • Online:2021-07-01 Published:2021-06-29

摘要:

动态非平衡数据分类是在线学习和类不平衡学习领域重要的研究问题,用于处理类分布非常倾斜的数据流。这类问题在实际场景中普遍存在,如实时控制监控系统的故障诊断和计算机网络中的入侵检测等。由于动态数据流中存在概念漂移现象和不平衡问题,因此数据流分类算法既要处理概念漂移,又要解决类不平衡问题。针对以上问题,提出了在检测概念漂移的同时对非平衡数据进行处理的一种方法。该方法采用Kappa系数检测概念漂移,进而检测平衡率,利用非平衡数据分类方法更新分类器。实验结果表明,在不同的评价指标上,该算法对非平衡数据流具有较好的分类性能。

关键词: 数据流, 非平衡数据, 概念漂移, Kappa系数, 分类算法

Abstract:

Online class imbalance learning is an important research problem in the field of online learning and class imbalanced learning. It is used to process data streams with much skewed class distribution. Such problems are common in practical scenarios, such as fault diagnosis of real-time control monitoring systems and intrusion detection in computer networks. Due to the concept drift phenomenon and imbalance problem in the dynamic data streams, the algorithm not only deals with concept drift, but also solves class imbalance problems. In view of the above problems, a method for processing imbalanced data streams while detecting concept drift is proposed. This algorithm uses Kappa coefficient to detect the concept drift, and then detects the balance rate, and finally updates the classifier. Experimental results show that the algorithm has better classification performance for imbalanced data streams on different evaluation indexes.

Key words: data streams, imbalanced data, concept drift, Kappa coefficient, classification algorithm