Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (5): 79-87.DOI: 10.3778/j.issn.1002-8331.2002-0163

Previous Articles     Next Articles

Prediction Model of Execution Time for Batch Application in Spark

LI Shuo, LIANG Yi   

  1. Faculty of Information, Beijing University of Technology, Beijing 100124, China
  • Online:2021-03-01 Published:2021-03-02



  1. 北京工业大学 信息学部,北京 100124


The prediction of execution time for batch application in Spark is the key technology to guide the resource allocation and application balance of Spark. However, the existing work adopts an unified prediction model for application with different behavior characteristics and considers limited factors in the model learning, which reduces the accuracy of prediction. In order to solve the above problems, an execution time prediction model for Spark batch application is proposed, which considers the diversity of batch application’s behavior characteristics. The model first classifies the execution time of Spark batch application based on strong-correlated metrics, and then uses PCA and GBDT algorithms to predict the execution time for each application category. Finally, when the ad-hoc application arrives, it is mapped into a specific application category and its execution time is predicted with the corresponding prediction model. The experimental results show that, compared with the unified prediction model, the proposed method can reduce the mean square root error and the mean absolute percentage error of the prediction results by 32.1% and 33.9% on average.

Key words: Spark, batch application, classification, prediction



关键词: Spark, 批处理应用, 分类, 预测