计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (18): 66-73.DOI: 10.3778/j.issn.1002-8331.1802-0199

• 大数据与云计算 • 上一篇    下一篇

一种低单盘故障恢复开销的局部修复码

萧  枫1,唐  聃1,范  迪1,白宁超2   

  1. 1.成都信息工程大学 软件工程学院,成都 610225
    2.四川省计算机研究院,成都 610041
  • 出版日期:2018-09-15 发布日期:2018-10-16

Local repairable code with low single disk failure recovery overhead

XIAO Feng1, TANG Dan1, FAN Di1, BAI Ningchao2   

  1. 1.School of Software Engineering, Chengdu University of Information Technology, Chengdu 610225, China
    2.Sichuan Institute of Computer Sciences, Chengdu 610041, China
  • Online:2018-09-15 Published:2018-10-16

摘要: 如今随着存储系统规模的扩大和廉价磁盘的大量使用,单一磁盘故障在存储系统中发生故障的概率也不断上升。而在基于RDP编码的阵列存储系统中,恢复单个故障磁盘,需要读取全部的剩余数据磁盘,读取开销大,故障恢复时间长。而故障时间长就会导致系统在恢复过程中出错的概率增大,影响系统整体的稳定性。为进一步降低单个磁盘故障恢复的读取开销,减少恢复时间,提升存储系统可靠性,提出一种局部修复RDP码,通过增加一个局部冗余列来减少故障恢复时需要读取的数据量。实验结果表明改进方法在降低读取开销和减少恢复时间方面相对于传统的RDP单盘故障恢复方法有明显提高,并且能够恢复75%的三盘故障情况。

关键词: RDP码, 单盘故障, 读取开销

Abstract: Now with the expansion of storage systems and the widespread use of inexpensive disks, the probability of a single disk failure occurred in a storage system is on the rise. In an RDP-based array storage system, recovering a single failed disk requires reading all the remaining data disks with a large read cost and long fault recovery time. The recovery of a long time will lead to the system in the recovery process increases the probability of error, affecting the overall stability of the system. In order to further reduce the reading overhead of a single disk failure recovery, reduce the recovery time and improve the reliability of the storage system, a kind of local repairable RDP code is proposed, which is adding a local redundant column to reduce the amount of data needed in recovery. The experimental results show that the improved method has a significant improvement over the traditional RDP single disk recovery method in reducing the reading overhead and reducing the recovery time, and in some cases, it can tolerate 75% of the three errors.

Key words: RDP code, single disk failure, reading overhead