%0 Journal Article %A LI Shuo %A LIANG Yi %T Prediction Model of Execution Time for Batch Application in Spark %D 2021 %R 10.3778/j.issn.1002-8331.2002-0163 %J Computer Engineering and Applications %P 79-87 %V 57 %N 5 %X

The prediction of execution time for batch application in Spark is the key technology to guide the resource allocation and application balance of Spark. However, the existing work adopts an unified prediction model for application with different behavior characteristics and considers limited factors in the model learning, which reduces the accuracy of prediction. In order to solve the above problems, an execution time prediction model for Spark batch application is proposed, which considers the diversity of batch application’s behavior characteristics. The model first classifies the execution time of Spark batch application based on strong-correlated metrics, and then uses PCA and GBDT algorithms to predict the execution time for each application category. Finally, when the ad-hoc application arrives, it is mapped into a specific application category and its execution time is predicted with the corresponding prediction model. The experimental results show that, compared with the unified prediction model, the proposed method can reduce the mean square root error and the mean absolute percentage error of the prediction results by 32.1% and 33.9% on average.

%U http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2002-0163