计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (12): 80-84.

• 大数据与云计算 • 上一篇    下一篇

一种基于概念重复性的数据流集成分类算法

尹绍宏,张盼盼   

  1. 天津工业大学,天津 300387
  • 出版日期:2016-06-15 发布日期:2016-06-14

Ensemble classification algorithm for data stream based on repeatability of concept

YIN Shaohong, ZHANG Panpan   

  1. Tianjin Polytechnic University, Tianjin 300387, China
  • Online:2016-06-15 Published:2016-06-14

摘要: 目前关于概念漂移数据流的分类研究已经取得了许多成果,但大部分没有充分考虑到数据流中概念重复出现的情况,这将耗费大量的计算和内存资源,增加了分类错误的可能性。为此,基于概念的重复性提出了一种数据流集成分类算法,该算法运用集成分类思想处理数据流中的概念漂移,但在学习过程中不会将暂时失效的概念及对应基分类器删除,而是把它们的基本信息存储起来,方便以后调用,并可根据概念间的转换关系预测即将到来的概念,在提高分类精度的同时又提高了时间效率。实验结果验证了算法的有效性。

关键词: 数据挖掘, 数据流, 集成分类, 概念漂移, 重复性

Abstract: Nowadays, the data stream classification research about concept drift has gained a lot of achievements. However, because of neglecting of the situation that concepts recur in the data steam, most of research methods will not only lead to high computation complexity and large memory overhead, but affect the classification accuracy. To solve this problem, based on the repeatability of concept, this paper proposes an ensemble classification algorithm for data stream, which applies ensemble classification theory to process the concept drift in data stream. On the one hand, the algorithm stores the essential information of temporary failure concepts and their corresponding base classifiers for later calls instead of deleting them during the learning process. On the other hand, it predicts the oncoming concept according to transitions between concepts. Therefore, the proposed algorithm can improve the classification accuracy and efficiency. Finally, the experimental results demonstrate the effectiveness of the new algorithm.

Key words: data mining, data stream, ensemble classification, concept drift, repeatability