计算机工程与应用 ›› 2014, Vol. 50 ›› Issue (10): 249-252.

• 工程与应用 • 上一篇    下一篇

同构Hadoop环境作业执行时间计算方法

张霄宏1,2,海林鹏2,贾宗璞2,沈记全3,赵文涛2   

  1. 1.中国科学院 深圳先进技术研究院,广东 深圳 518055
    2.河南理工大学 计算机科学与技术学院,河南 焦作 454003
    3.河南理工大学 现代教育中心,河南 焦作 454003
  • 出版日期:2014-05-15 发布日期:2014-05-14

Method for computing execution time of jobs in homogeneous hadoop environments

ZHANG Xiaohong1,2, HAI Linpeng2, JIA Zongpu2, SHEN Jiquan3, ZHAO Wentao2   

  1. 1.Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China
    2.School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, Henan 454003, China
    3.Center of Modern Education, Henan Polytechnic University, Jiaozuo, Henan 454003, China
  • Online:2014-05-15 Published:2014-05-14

摘要: 执行时间是作业调度的重要参考因素之一。通过分析Hadoop MapReduce环境作业的执行特征,提出了以map任务和reduce任务执行时间为输入,估算作业执行时间的方法。该方法在一定假设条件下,借助作业预执行来获取map任务和reduce任务的执行时间。实验结果表明,该方法估算作业执行时间的误差率小于7%。

关键词: Hadoop MapReduce, 作业执行时间, 调度

Abstract: Execution time is very important for job scheduling. In this paper, the execution characters of Hadoop MapReduce jobs are analyzed, and then a new method is proposed to compute the execution times of these jobs. The method takes the execution times of map task and reduce task as input data. It captures these execution times by pre-executing under an assumption. The method has been evaluated in a Linux cluster, the experiment results show that the method computed the execution times of jobs with the error rate no more than 7%.

Key words: Hadoop MapReduce, execution time, scheduling