一种基于概念重复性的数据流集成分类算法

计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (12): 80-84.

一种基于概念重复性的数据流集成分类算法

尹绍宏，张盼盼

天津工业大学，天津 300387

出版日期:2016-06-15 发布日期:2016-06-14

Ensemble classification algorithm for data stream based on repeatability of concept

YIN Shaohong, ZHANG Panpan

Tianjin Polytechnic University, Tianjin 300387, China

Online:2016-06-15 Published:2016-06-14

摘要/Abstract

摘要： 目前关于概念漂移数据流的分类研究已经取得了许多成果，但大部分没有充分考虑到数据流中概念重复出现的情况，这将耗费大量的计算和内存资源，增加了分类错误的可能性。为此，基于概念的重复性提出了一种数据流集成分类算法，该算法运用集成分类思想处理数据流中的概念漂移，但在学习过程中不会将暂时失效的概念及对应基分类器删除，而是把它们的基本信息存储起来，方便以后调用，并可根据概念间的转换关系预测即将到来的概念，在提高分类精度的同时又提高了时间效率。实验结果验证了算法的有效性。

关键词: 数据挖掘, 数据流, 集成分类, 概念漂移, 重复性

Abstract: Nowadays, the data stream classification research about concept drift has gained a lot of achievements. However, because of neglecting of the situation that concepts recur in the data steam, most of research methods will not only lead to high computation complexity and large memory overhead, but affect the classification accuracy. To solve this problem, based on the repeatability of concept, this paper proposes an ensemble classification algorithm for data stream, which applies ensemble classification theory to process the concept drift in data stream. On the one hand, the algorithm stores the essential information of temporary failure concepts and their corresponding base classifiers for later calls instead of deleting them during the learning process. On the other hand, it predicts the oncoming concept according to transitions between concepts. Therefore, the proposed algorithm can improve the classification accuracy and efficiency. Finally, the experimental results demonstrate the effectiveness of the new algorithm.

Key words: data mining, data stream, ensemble classification, concept drift, repeatability

尹绍宏，张盼盼. 一种基于概念重复性的数据流集成分类算法[J]. 计算机工程与应用, 2016, 52(12): 80-84.

YIN Shaohong, ZHANG Panpan. Ensemble classification algorithm for data stream based on repeatability of concept[J]. Computer Engineering and Applications, 2016, 52(12): 80-84.

[1]	宗晓萍，陶泽泽. 基于掌握速度的知识追踪模型[J]. 计算机工程与应用, 2021, 57(6): 117-123.
[2]	高天宇，王庆荣，杨磊. 粗糙集属性依赖度强化的应急数据挖掘模型[J]. 计算机工程与应用, 2021, 57(3): 87-93.
[3]	王方，张雪英，胡风云，李凤莲. 集成分类器对脑卒中患者脑电的分类[J]. 计算机工程与应用, 2021, 57(24): 276-282.
[4]	马洋，赵旭俊. 基于相关子空间的多源离群检测算法[J]. 计算机工程与应用, 2021, 57(17): 88-95.
[5]	张念蓬，吴旭，朱强. 基于熵的过采样框架[J]. 计算机工程与应用, 2021, 57(13): 96-101.
[6]	王俊红，郭亚慧. 面向动态数据块的非平衡数据流分类算法[J]. 计算机工程与应用, 2021, 57(13): 124-129.
[7]	周玉，朱文豪，房倩，白磊. 基于聚类的离群点检测方法研究综述[J]. 计算机工程与应用, 2021, 57(12): 37-45.
[8]	张博文，刘智，桑国明. 基于核密度波动的异常检测算法[J]. 计算机工程与应用, 2021, 57(12): 132-136.
[9]	饶加旺，马荣华. 改进核密度估计的空间点密度算法[J]. 计算机工程与应用, 2021, 57(11): 260-265.
[10]	王杰，陈志刚，刘加玲，程宏兵. 基于聚类的云隐私行为挖掘技术[J]. 计算机工程与应用, 2020, 56(5): 80-84.
[11]	王子龙，李进，宋亚飞. 基于距离和权重改进的K-means算法[J]. 计算机工程与应用, 2020, 56(23): 87-94.
[12]	衣俊艳，吴博雅，雍巧玲. 具有加权特性的弹性网络聚类算法研究[J]. 计算机工程与应用, 2020, 56(22): 55-65.
[13]	纪文璐，王海龙，苏贵斌，柳林. 基于关联规则算法的推荐方法研究综述[J]. 计算机工程与应用, 2020, 56(22): 33-41.
[14]	徐清妍，何丽，朱泓西. 改进Hoeffding不等式的概念漂移检测方法[J]. 计算机工程与应用, 2020, 56(19): 55-61.
[15]	刘文芬，穆晓东，黄月华. 基于多分辨率网格的异常检测方法[J]. 计算机工程与应用, 2020, 56(17): 78-85.