计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (10): 31-38.DOI: 10.3778/j.issn.1002-8331.1801-0429

• 热点与综述 • 上一篇    下一篇

基于[t]分布混合模型的半监督网络流分类方法

董育宁,朱善胜,赵家杰   

  1. 南京邮电大学 通信与信息工程学院,南京 210003
  • 出版日期:2018-05-15 发布日期:2018-05-28

Semi-supervised network traffic classification based on t-distribution mixture model

DONG Yuning, ZHU Shansheng, ZHAO Jiajie   

  1. College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
  • Online:2018-05-15 Published:2018-05-28

摘要: 针对传统高斯分布容易受到数据样本边缘值和离群点噪声的影响,改用t分布替代原有的高斯混合模型,并使用期望最大化(Expectation Maximization,EM)算法对网络流数据样本进行t分布混合模型的建模。为降低EM算法的迭代次数,对t分布混合模型进行了改进,用理论和实验验证了算法的有效性,并对网络多媒体业务流进行了分类研究。实验表明,提出的算法有较高的分类准确率,拟合的模型要优于传统的K-Means算法和传统的高斯混合模型的EM算法。

关键词: 网络流分类, t分布混合模型, 期望最大化算法, 半监督分类

Abstract: Traditional Gaussian distribution is susceptible to the influence of edge points and outliers in data samples, therefore student’s t-distribution is adopted to improve Gaussian distribution. The EM(Expectation Maximization) algorithm is used to build T-distribution Mixture Model(TMM) for network multimedia traffic dataset. Then, A new Limited T-distribution Mixture Model(LTMM) is presented to reduce the number of iterations for EM algorithm, whose effectiveness is demonstrated by theoretical analysis and experiments. The flows of multimedia services are classified in this paper. Experiments show that the proposed algorithm can achieve higher accuracy than existing methods and the fitted model is better than K-Means algorithm and EM algorithm for Gaussian mixture model.

Key words: internet traffic classification, t-distribution mixture model, Expectation Maximization(EM) algorithm, semi-supervised learning