计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (8): 1-8.DOI: 10.3778/j.issn.1002-8331.1812-0009

• 热点与综述 • 上一篇    下一篇

分布式站点间的跨域文件系统

徐  琪1,2,王  聪1,2,程耀东1,陈  刚1   

  1. 1.中国科学院 高能物理研究所,北京 100049
    2.中国科学院大学,北京 100049
  • 出版日期:2019-04-15 发布日期:2019-04-15

Cross-Domain File System for Distributed Sites

XU Qi1,2, WANG Cong1,2, CHENG Yaodong1, CHEN Gang1   

  1. 1.Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
    2.University of Chinese Academy of Sciences, Beijing 100049, China
  • Online:2019-04-15 Published:2019-04-15

摘要: 高能物理科学研究大多依托固定站点大科学装置,拥有海量实验数据。因此数据计算往往基于异地站点的海量实验数据。针对这些海量的分布式实验数据,传统的高能物理计算模式中采用了网格的方式进行跨域数据共享,但资源利用率低、响应时间长以及部署维护困难等问题,限制了网格技术在中小型站点间的数据共享。针对高能物理计算环境中,中小型站点间的数据共享问题,以Streaming & Cache为核心思想,设计一种远程文件系统,提出远程数据访问本地化,提供高实时性数据访问模式,实现基于HTTP协议的按需数据传输与管理,拥有数据块散列存储和文件统一化视图管理。与高能物理计算中常用分布式文件系统EOS、Lustre、GlusterFS相比,具有广域网可用性、网络时延不敏感性和高性能数据访问模式。

关键词: 高能物理实验数据, 跨域访问, 远程文件系统, 缓存, 高性能

Abstract: Lots of data are produced by large scale scientific facilities in High Energy Physics(HEP) studies. Scientific computing is based on these distributed data. Grid computing technology is used to share data between different sites in traditional way. However, low resource utilization, long response time and difficult operations limit data sharing between small sites. While a cross-domain file system based on streaming & cache is designed for data sharing between small sites in HEP computing. Native access for remote data, quick response, data transmission and management on demand based on HTTP, data block hash and store, uniform file view are implemented. Compared with commonly distributed file systems EOS, GlusterFS and Lustre, availability in WAN, insensitivity to network delay and high-performance data access are performed.

Key words: experimental data in High Energy Physics(HEP), cross-domain access, remote file system, cache, high performance