Computer Engineering and Applications ›› 2018, Vol. 54 ›› Issue (2): 107-113.DOI: 10.3778/j.issn.1002-8331.1608-0043

Previous Articles     Next Articles

Abnormal data flow oriented selection integration method of multiple classifiers

YANG Rongze, LIU Yi   

  1. Faculty of Computer, Guangdong University of Technology, Guangzhou 510006, China
  • Online:2018-01-15 Published:2018-01-31

面向异常数据流的多分类器选择集成方法

杨融泽,柳  毅   

  1. 广东工业大学 计算机学院,广州 510006

Abstract: Traditional classifier selection algorithm generates a large computing and storage overhead. Another, for the forecast stability of abnormal data flow, multiple classifiers is an important factor to solve the concept drift. This paper has solved the problem about fuzzy degree of difference between each classifier collection by introducing the improved decision contour matrix and the support entropy. The degree of differences uses support entropy as standard of input measure, making calculation of differences in each classifier collection more stable and efficient. An abnormaly data flow detection method and algorithm based on diversity integration is proposed. The algorithm is applied to the anomaly classifier selection module, and mainly includes three processes:constructing decision contour matrix, integrating support entropy and measuring classifier ensemble dissimilarity. Experimental result shows that both accuracy and stability of the BDMS algorithm are better than other algorithms in accuracy and stability of abnormal traffic prediction. Since the classifier training time reach about 10-2 s, basically it is able to adapt to the real-time demand for data traffic.

Key words: selection integration, abnormal data flow, decision contour matrix, support entropy, difference measure

摘要: 传统的多分类器选择算法产生较大的计算和存储开销。另外,多分类器对异常数据流的预测稳定性是解决概念飘移的重要因素。通过引入改进的决策轮廓矩阵和支持熵解决了每个分类器集合之间模糊差异度问题,并将支持熵作为差异度度量的输入衡量标准,使分类器集合之间的差异度计算更加稳定高效,并在此基础上提出了一种基于差异度集成的异常数据流检测方法并实现其算法;该方法应用在异常分类器选择模块,主要包括三个步骤:构建决策轮廓矩阵、整合支持熵、分类器集合差异度度量。实验结果表明,该算法对异常流量的预测精度和稳定性相比其他算法较好,由于分类器训练时间达到10-2 s左右,基本上能够适应数据流量检测的实时性需求。

关键词: 选择集成, 异常数据流, 决策轮廓矩阵, 支持熵, 差异度量