计算机工程与应用 ›› 2017, Vol. 53 ›› Issue (9): 90-96.DOI: 10.3778/j.issn.1002-8331.1601-0379

• 大数据与云计算 • 上一篇    下一篇

几种准实时IT系统负荷预测方法比较研究#br# ——基于R语言

曾会锋,刘一田   

  1. 南瑞集团公司(国网电力科学研究院) 南瑞信息通信分公司,南京 210003
  • 出版日期:2017-05-01 发布日期:2017-05-15

Comparison and study of several quasi-real time IT system workload forecast methods based on R language

ZENG Huifeng, LIU Yitian   

  1. Information Technology and Communication Company, NARI Group Corporation(State Grid Electric Power Research Institute), Nanjing 210003, China
  • Online:2017-05-01 Published:2017-05-15

摘要: 为了找到简单有效的模型对IT系统负荷进行预测,通过使用现有的IT系统建立压力测试环境,模拟负载逐渐增加过程,同时采集IT系统的性能参数(如CPU使用量,内存使用率,网络带宽使用率),并利用几种传统预测方法对采集到的性能参数进行分析和预测,比较各种方法的适应性。研究实验分别采用直线回归、对数回归、二次曲线回归、Holt-Winters平滑、ARIMA、R语言包auto ARIMA、均值模型、中位数模型等八种方法进行建模,利用R语言计算出一定时刻(如10 s)后的预测结果,最后将预测结果与实测结果进行比较,根据其绝对/相对误差的大小,确定合适的预测模型。通过实验,得到了这样的结论:对于变化比较剧烈的CPU利用率,适合使用中位数模型进行准实时预测,平均精度较高,而对于变化比较缓慢的内存使用数则适合用ARIMA模型进行预测;对于变化比较大,但变化周期较长的网络带宽使用率则适合使用Holt-Winters方法。该结论可作为工程实际选择准实时连续预测最优方法的依据。

关键词: IT系统, 负荷预测, 时间序列分析, R语言应用, auto ARIMA, 中位数模型

Abstract: In order to find a simple and effective model to predict the workload of an IT system(run upon a JVM), a press test environment upon an existing IT system is set up, simulating the workload increasing gradually, meanwhile sampling the JVM’s performance parameters data(CPU usage, memory usage, network bandwidth usage), based on the time series data, eight conventional models are used to predict continuously the next ten second’s probable value, and compute out the mode parameters with R language, finally the one with the most robustness and prediction accuracy is found out. The eight models are: linear regression, linear regression with logarithm, two curve regression, Holt-Winters exponential smoothing, ARIMA, auto ARIMA in R language package(by Hyndman), mean model, median model. The result shows that different suitable models exist in the three performance parameters predication: the CPU usage changing fast usually, the median model is the optimal one; the memory usage changing slowly usually, the auto ARIMA or the ARIMA model mentioned in the paper is the optimal two; the network bandwidth usage changing slowly with large amplitude, the Holt-Winters model is the optimal one. The conclusion can be applied to production as basis when choosing the optimal quasi-real time predict model.

Key words: IT system, workload forecast, time series, R language, Auto Regressive Integrated Moving Average(ARIMA), median model