Improvement of job scheduling algorithm on Hadoop

doi:10.3778/j.issn.1002-8331.1606-0108

Abstract

Abstract: Distributed cluster has the problem of load balancing, and the Hadoop does not take into account the differences in the performance of the nodes. Although it has a load balancing mechanism, the effect is not ideal. As a result, there is often a load imbalance in the process of running. In view of the above problem, this paper has in-depth analysis of the Hadoop source code, to clarify of hadoop principle, and improves Hadoop task scheduling in Yarn which is resource management mechanism of Hadoop. Then establishes new task scheduling rules, and also proposes a performance evaluation index for each node, performance evaluation includes dynamic performance and static performance. On the basis of this, this paper improves FairScheduler algorithm of Yarn, and forms a scheduling algorithm considering the performance of nades. To recompile the Hadoop source code, and comparative experiment which carries out on the Hadoop platform, and proves the performance index of the join node can effectively solve the problem of Hadoop load balancing, greatly improves of running efficiency on Hadoop.

Key words: big data, Hadoop, Yarn, load balancing, FairScheduler algorithm

摘要： 分布式集群普遍存在负载均衡问题，而Hadoop没有考虑到节点间性能的差异.虽然有负载均衡机制，但是效果不太理想，因此运行过程中经常会出现负载不均衡的情况。针对如上问题，深入分析了Hadoop源代码，理清了Hadoop的运行原理，在Hadoop资源管理机制Yarn中改进了Hadoop任务的排序，建立了新的任务排序规则，提出了对各节点性能评价的指标，分为动态性能指标和静态性能指标。在此基础上对Yarn的FairScheduler算法进行了改进，形成了考虑节点性能的调度算法。重新对Hadoop源码进行了编译，在所搭建的Hadoop平台上进行了对比实验，证明了加入节点性能指标有效解决了Hadoop负载均衡问题，对Hadoop的运行效率有了很大提高。

关键词: 大数据, Hadoop, Yarn, 负载均衡, FairScheduler算法

FENG Xingjie, HE Yang. Improvement of job scheduling algorithm on Hadoop[J]. Computer Engineering and Applications, 2017, 53(12): 85-91.

冯兴杰，贺阳. 改进的Hadoop作业调度算法[J]. 计算机工程与应用, 2017, 53(12): 85-91.

[1]	WU Hao, XU Xingjian, MENG Fanjun. Knowledge Graph-Assisted Multi-task Feature-Based Course Recommendation Algorithm [J]. Computer Engineering and Applications, 2021, 57(21): 132-139.
[2]	WU Dongyang, DOU Jianping, LI Jun. Design of Digital Twin System for Quadrotor [J]. Computer Engineering and Applications, 2021, 57(16): 237-244.
[3]	LI Leixiao, DENG Dan, LI Jie, WANG Yongsheng. All-to-All Comparison Computing Data Distribution Strategy Based on Particle Swarm Optimization [J]. Computer Engineering and Applications, 2021, 57(15): 109-117.
[4]	WANG Baojian, HU Dasha, JIANG Yuming. Application of Improved A* Algorithm in Path Planning [J]. Computer Engineering and Applications, 2021, 57(12): 243-247.
[5]	YUAN Yang, YE Feng, LAI Yizong, ZHAO Yuting. Multi-AGV Path Planning Combined with Load Balancing and A* Algorithm [J]. Computer Engineering and Applications, 2020, 56(5): 251-256.
[6]	LI Ling, GU Xiaomei, LIU Zihao. Application Research of Multi-subdomain Random Forest in Context-Aware Recommendation [J]. Computer Engineering and Applications, 2020, 56(22): 132-141.
[7]	WANG Yonggui, GUO Xintong. Efficient Frequent Set Mining Algorithm for Adaptive Data Sets on SparkSql [J]. Computer Engineering and Applications, 2020, 56(21): 72-78.
[8]	ZHANG Meng, SUN Bingzhen, CHU Xiaoli. Gout Diagnosis Model Based on Neighborhood Cost Sensitive Three-Way Decision [J]. Computer Engineering and Applications, 2020, 56(16): 218-225.
[9]	WANG Li, ZHAO Aqun, ZHAO Chenhui. Virtual Slice Load Balancing Algorithm Based on Fat-Tree [J]. Computer Engineering and Applications, 2020, 56(13): 93-99.
[10]	QIN Feng, ZENG Hao, LIN Kaidong. Load Balancing-Based Routing Protocol for LLN Under High-Load Scenario [J]. Computer Engineering and Applications, 2020, 56(1): 121-126.
[11]	WU Yangyang, TANG Jianguo. Research Progress of Attribute Reduction Based on Rough Set in Context of Big Data [J]. Computer Engineering and Applications, 2019, 55(6): 31-38.
[12]	ZHU Ruijin, GONG Xuejiao, TANG Bo. Distributed Hybrid Compressive Sensing for Wireless Sensor Network Data Collection [J]. Computer Engineering and Applications, 2019, 55(6): 73-80.
[13]	LIU Jun, LI Wei, WU Mengting, CHEN Qifeng. New Design of Image Parallel Processing Model Based on Hadoop Platform [J]. Computer Engineering and Applications, 2019, 55(6): 186-190.
[14]	WANG Jun, WANG Menglin, WANG Yue, LIU Junjie. Load Balancing Scheme Based on Flow Classification for SDN Data Center Network [J]. Computer Engineering and Applications, 2019, 55(24): 75-83.
[15]	WANG Jingyu, LUAN Junqing, TAN Yuesheng. Research on Big Data Access Control Model Based on Data Sensitivity [J]. Computer Engineering and Applications, 2019, 55(23): 70-77.

Improvement of job scheduling algorithm on Hadoop

改进的Hadoop作业调度算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics