计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (13): 77-83.DOI: 10.3778/j.issn.1002-8331.1906-0169

• 大数据与云计算 • 上一篇    下一篇

Alluxio数据随机访问方法的研究

魏占辰,黄秋兰,孙功星,刘晓宇,王轶   

  1. 1.中国科学院 高能物理研究所,北京 100049
    2.中国科学院大学,北京 100049
  • 出版日期:2020-07-01 发布日期:2020-07-02

Research on Random Access Method of Alluxio

WEI Zhanchen, HUANG Qiulan, SUN Gongxing, LIU Xiaoyu, WANG Yi   

  1. 1.Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
    2.University of Chinese Academy of Sciences, Beijing 100049, China
  • Online:2020-07-01 Published:2020-07-02

摘要:

为保证系统的可扩展性和容错性,Alluxio简化了文件系统实现,不支持数据随机访问,但在实际情况中仍有许多应用需要数据随机访问。Alluxio原生Java接口灵活性较差,不支持传统应用,不能完全发挥内存的高速性能。因此在深入分析Alluxio数据读写原理后,提出了新式数据随机访问方法,其核心思想是改变原有数据访问和缓存时机,将对Alluxio中的文件读写转化为对本地内存文件系统的文件读写,从而实现对数据的随机访问。在此基础上,还可以使用内存映射技术进一步提高本地文件的读写性能。测试结果表明,该方法的数据读取性能提升了14.5%,写入性能提升了1.4倍以上。在实际应用场景中合理使用Alluxio及新式数据随机访问方法,可获得数倍至数十倍的性能提升。

关键词: Alluxio分布式内存存储系统, 数据随机访问, 内存计算, 内存映射, 科学计算

Abstract:

Alluxio simplifies the implementation of file system to pursue scalability and fault tolerance, like not supporting data random access, which leads it cannot be available to many applications. Also Alluxio’s native Java interface is not flexible to support traditional programs and is not able to fully use the high-speed performance of memory. In order to address these issues, a new method to support data random access is proposed. The new method allows the tasks to random read/write data on the local file system as well local memory instead of using original Alluxio data streaming interface. With random access to local memory, this work can greatly improve the read/write performance. Test results show that write performance can be improved by 14.5% and read performance can be increased by more than 1.4 times compared to the native Java interface. Combined with application, this new method can achieve up to tens times of performance improvement.

Key words: Alluxio distributed in-memory storage system, data random access, memory computing, memory mapping, scientific computing