Computer Engineering and Applications ›› 2020, Vol. 56 ›› Issue (23): 68-73.DOI: 10.3778/j.issn.1002-8331.1912-0293

Previous Articles     Next Articles

Research on Optimization for Iteration-Intensive Applications on Spark

WEI Zhanchen, LIU Xiaoyu, HUANG Qiulan, SUN Gongxing   

  1. 1.Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
    2.University of Chinese Academy of Sciences, Beijing 100049, China
  • Online:2020-12-01 Published:2020-11-30



  1. 1.中国科学院 高能物理研究所,北京 100049
    2.中国科学院大学,北京 100049


Spark is a very popular and widely applicable big data processing framework with good easy-using and scalability. However, there are still some problems that need to be solved in practical applications. For example, in some iteration-intensive computing scenarios, the acceleration effect is not ideal. The reason is that the application efficiency is influenced by large additional loss introduced when using Spark. In order to accurately analyze and reduce these losses, this paper proposes a Spark efficiency formula. Additional losses are measured with the distributed calculation cost and application efficiency is measured with effective calculation ratio. This paper also proposes an optimization strategy for iteration-intensive applications on Spark according to the formula. Test results show that the effective calculation ratio has been greatly improved by about 0.373 and the execution time has been reduced by about 68.2%.

Key words: Spark, optimization for iteration-intensive application, distributed calculation cost, effective calculation ratio



关键词: Spark, 迭代密集型应用优化, 分布式计算代价, 有效计算比