计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (6): 86-91.DOI: 10.3778/j.issn.1002-8331.1811-0323

• 网络、通信与安全 • 上一篇    下一篇

流量的集成学习与重采样均衡分类方法

顾兆军,吴优,赵春迪,周景贤   

  1. 1.中国民航大学 信息安全测评中心,天津 300300
    2.中国民航大学 中欧航空工程师学院,天津 300300
    3.中国民航大学 计算机科学与技术学院,天津 300300
  • 出版日期:2020-03-15 发布日期:2020-03-13

Resampling and Boosting Techniques for Balanced Traffic Classification

GU Zhaojun, WU You, ZHAO Chundi, ZHOU Jingxian   

  1. 1.Information Security Evaluation Center of Civil Aviation, Civil Aviation University of China, Tianjin 300300, China
    2.Sino-European Institute of Aeronautical Engineering, Civil Aviation University of China, Tianjin 300300, China
    3.College of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China
  • Online:2020-03-15 Published:2020-03-13

摘要:

针对传统基于机器学习的流量分类方法中数据不均衡影响分类效果的问题,提出了一种基于重采样的梯度增强树算法。该算法利用流量数据的统计特征,通过回溯搜索策略优化特征集合并设计适用于流量分类的树结构参数,构造最优模型;利用结合重采样的LightGBM算法修正数据不平衡性并进行分类测试。经实验验证,该算法提高了不平衡数据的分类效果,并且具有性能稳定、快速的优点。

关键词: 机器学习, 集成学习, 数据不平衡, 网络流量, 重采样

Abstract:

Since the data imbalance affects the accuracy of the traffic classification based on machine learning, a traffic classification algorithm based on ensemble learning and resampling RES-LGBM is tailored. The algorithm uses statistical features of traffic flows, and optimizes the feature set by backtracking search method. After determination of optimal tree structure, the RES-LGBM is employed to eliminate the data imbalance and test the classification result. The test result shows that the algorithm enhances the classification of imbalanced data with high efficiency and stablility.

Key words: machine learning, ensemble learning, data imbalance, network flow, resampling