Computer Engineering and Applications ›› 2012, Vol. 48 ›› Issue (21): 58-61.

Previous Articles     Next Articles

Method for optimization of data replication in Hadoop

LI Yeda1, LIN Weiwei2   

  1. 1.Guangdong Justice Pilice Vocational College, Guangzhou 510520, China
    2.School of Computer Science and Engineering, South China University of Technology, Guangzhou 510641, China
  • Online:2012-07-21 Published:2014-05-19

一种Hadoop数据复制优化方法

利业鞑1,林伟伟2   

  1. 1.广东司法警官职业学院,广州 510520
    2.华南理工大学 计算机科学与工程学院,广州 510641

Abstract: To solve the lack of improving data availability using fixed number of replication in Hadoop, an optimized mathematical model for data replication is proposed. The minimum number of data replication is calculated with this model based on failure rate of data nodes, data access latency, network bandwidth of data nodes, expected data availability. The proposed optimization method of data replication is implemented on Hadoop and the performance testing experiments are conducted. Experimental results show that the proposed model can improve data availability and utilization of storage space in cloud storage system.

Key words: cloud storage, data replication, optimization, availability

摘要: 针对当前Hadoop采用固定个数的数据复制来提高数据可用性方法的不足,建立了数据复制的数学模型,该模型根据数据节点失效率、数据访问延迟、数据节点的网络带宽、期望的数据可用性计算优化的数据复制个数,在Hadoop上实现了提出的数据复制优化方法,进行性能测试实验,实验结果表明该模型不仅可以改进数据可用性,而且提高了系统存储空间的利用率。

关键词: 云存储, 数据复制, 优化, 可用性