计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (22): 22-25.

• 热点与综述 • 上一篇    下一篇

Hadoop云平台MapReduce模型优化研究

张  红1,2,王晓明1,曹  洁2,马彦宏3,郭义戎1,王  慜1   

  1. 1.兰州理工大学 电气与信息工程学院,兰州 730050
    2.兰州理工大学 计算机与通信学院,兰州 730050
    3.国网甘肃省电力公司,兰州 730030
  • 出版日期:2016-11-15 发布日期:2016-12-02

Research on optimized MapReduce model of Hadoop cloud platform

ZHANG Hong1,2, WANG Xiaoming1, CAO Jie2, MA Yanhong3, GUO Yirong1, WANG Min1   

  1. 1.College of Electrical & Information Engineering, Lanzhou University of Technology, Lanzhou 730050, China
    2.College of Computer & Communication, Lanzhou University of Technology, Lanzhou 730050, China
    3.State Grid Gansu Electric Company, Lanzhou 730030, China
  • Online:2016-11-15 Published:2016-12-02

摘要: 针对Hadoop平台MapReduce分布式计算模型运行机制中的顺序制约而产生的计算资源浪费问题,从提高平台中每个执行节点的细粒度并行数据处理角度出发,结合Java共享内存多线程编程技术,对该模型进行了优化,提出一种MapReduce+OpenMP粗细粒度相结合的分布式并行计算模型。并在由四个节点组成的Hadoop集群环境下对不同规模大小的出租车GPS轨迹数据分析处理,验证该模型的性能和效率,实验结果证明MapReduce+OpenMP分布式并行计算模型确实能够提高针对大数据集的计算效率,是对Hadoop平台大数据分析处理模型有效的完善和优化。

关键词: Hadoop, MapReduce, OpenMP, 分布式, 并行

Abstract: Sequential control of running mechanism of MapReduce model on Hadoop platform can lead to waste of computing resources. From the perspective of the fine-grained parallel data processing of each node, combined with multi-threads technique of Java shared memory, this paper optimizes MapReduce model and puts forward a MapReduce+OpenMP framework. This model is a distributed and parallel computing architecture based on Hadoop cloud platform, which combines computing resources of coarse and fine granularity. After programming and realizing on the GPS trajectory data of the taxi in the Hadoop distributed cluster environment, the results show that this distributed parallel computing model can really improve the computing efficiency of processing big data set, and it is an effective optimization and improvement to the MapReduce model of big data processing.

Key words: Hadoop, MapReduce, OpenMP, distributed, parallel