计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (21): 64-69.

• 理论研究、研发设计 • 上一篇    下一篇

基于成本的MapReduce工作流优化器

冯秋燕   

  1. 河南财经政法大学,郑州 450000
  • 出版日期:2015-11-01 发布日期:2015-11-16

Cost-based MapReduce workflow optimizers

FENG Qiuyan   

  1. Henan University of Economics and Law, Zhengzhou 450000, China
  • Online:2015-11-01 Published:2015-11-16

摘要: 对MapReduce栈的不同层进行优化有各自的优缺点。针对MapReduce工作负载的优化问题,提出了相关概念;通过与RoT的对比,介绍了MapReduce工作基于成本的优化及所使用的相关技术,并对MapReduce基于成本的优化进行了评估;基于工作流中的数据流依赖和资源依赖关系,提出了三种工作流优化器,评估了基于成本的工作流优化,并对工作流优化器进行了终端-对-终端的评估;通过实验评估了工作流优化器的优化开销并对这三种工作流优化器的优缺点进行了对比分析。

关键词: MapReduce工作负载, 优化, 数据流依赖, 资源依赖, 工作流优化器

Abstract: Optimizations at different levels of the MapReduce stack have their advantages and disadvantages. For MapReduce workload optimization problem, related concepts are proposed; cost-based optimization approaches and related technology of MapReduce jobs are introduced and assessed through comparation with RoT; three MapReduce workflow optimizers are presented for cost-based optimization of MapReduce workflows based on dataflow and resource dependencies. Cost-based workflow optimization is evaluated. End-to-end evaluation of the workflow optimizer is described; the advantages and disadvantages of these three workflow optimizers are analyzed through experimental evaluation of their overhead.

Key words: MapReduce workloads, optimization, dataflow dependencies, resource dependencies, workflow optimizer