Computer Engineering and Applications ›› 2019, Vol. 55 ›› Issue (10): 67-72.DOI: 10.3778/j.issn.1002-8331.1801-0303

Previous Articles     Next Articles

Distributed File System Load Balancing in Cloud Environment

WU Yaoyao1, YANG Geng1,2   

  1. 1.College of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
    2.Jiangsu Key Laboratory of Big Data Security & Intelligent Processing, Nanjing 210023, China
  • Online:2019-05-15 Published:2019-05-13

云环境下分布式文件系统负载均衡研究

吴瑶瑶1,杨  庚1,2   

  1. 1.南京邮电大学 计算机学院,南京 210023
    2.江苏省大数据安全与智能处理重点实验室,南京 210023

Abstract: Hadoop Distributed File System(HDFS) is a low-cost, highly fault-tolerant distributed file system that suitable for running on commodity hardware, and offers high-throughput data access for applications on large datasets. However, there are some performance optimization problems in HDFS, such as under-load balancing. Although Hadoop system comes with a load balancer to achieve balanced adjustment, but users need to give a static threshold in advance. In order to solve the fixed threshold and subjectivity, through the analysis, evaluation and optimization of disk space utilization, CPU utilization, memory utilization, the disk I/O occupancy rate, the network bandwidth occupancy rate and other parameters, this paper forms a calculating expression for a threshold, and through the theoretical analysis and simulation experiments, this paper verifies the threshold calculation and load balancing. The experimental results show that this method achieves a better balance effect and improves the utilization of computing resources compared with the Hadoop static input threshold algorithm.

Key words: cloud environment, Hadoop Distributed File System(HDFS), load balancing, dynamic threshold

摘要: Hadoop分布式文件系统(Hadoop Distributed File System,HDFS)是一种适合在通用硬件上运行的低成本、高度容错性的分布式文件系统,能提供高吞吐量的数据访问,适合针对大规模数据集上的应用。然而,HDFS中还面临一些性能优化问题,如负载均衡不足。虽然Hadoop系统自带的负载均衡器可以实现均衡调整,但需要用户预先给出静态的阈值。为了解决阈值的固定性和主观性,通过对磁盘空间使用率、CPU利用率、内存利用率、磁盘I/O占用率、网络带宽占用率等参数的分析评估优化,形成对阈值的计算表达式,并通过理论分析和仿真实验对阈值的计算和负载均衡进行验证。实验结果表明,相比较Hadoop静态的输入阈值的算法,该方法达到了更好的平衡效果,提高了计算资源的利用率。

关键词: 云环境, Hadoop分布式文件系统(HDFS), 负载均衡, 动态阈值