Computer Engineering and Applications ›› 2014, Vol. 50 ›› Issue (22): 43-49.

Previous Articles     Next Articles

Data access monitoring mechanism in Hadoop platform

WANG Yufeng1, LIANG Yi1, JIN Yi2, LI Guangrui1   

  1. 1.College of Computer Science, Beijing University of Technology, Beijing 100124, China
    2.Beijing Computing Center, Beijing 100094, China
  • Online:2014-11-15 Published:2014-11-13

Hadoop平台数据访问监控机制研究

王玉凤1,梁  毅1,金  翊2,李光瑞1   

  1. 1.北京工业大学 计算机学院,北京 100124
    2.北京市计算中心,北京 100094

Abstract: Aiming on the issue of task scheduler considering the data location information for locality-based data processing in Hadoop Map tasks, a novel data access behavior monitoring mechanism is proposed in this paper. It is argued that the data access monitoring mechanism of Hadoop platform should not only serve to promote the efficiency of data access, but also serve to promote the execution efficiency of parallel Map/Reduce jobs. It is necessary to monitor the balance of data access overhead in the parallel execution of multiple Map tasks. The granularity and information set of data access monitoring in Hadoop platform is defined; The master-slave-based monitoring architecture is presented, which works with the support of Hadoop existing function modules; The detail implementation of the main monitoring function modules is discussed and the experimental results is analyzed.

Key words: Hadoop, Map/Reduce, monitoring, data access

摘要: 针对Hadoop平台数据被任务调度感知,进行本地化处理的新特征,探索Haoop平台中Map任务数据访问监控机制。提出Hadoop平台数据访问监控不仅应服务于数据存取效率的提升,还应服务于Map/Reduce并行作业执行效率提升的基本思想,并增加对并行执行多Map任务数据访问开销均衡性的监控。基于该思想,定义Hadoop平台数据访问监控的粒度和监控信息组成;依托Hadoop平台现有结构,设计了基于master-slave的监控体系结构,并给出了监控主要功能模块的具体实现技术及测试结果。

关键词: Hadoop, Map/Reduce, 监控, 数据访问