Research of Metadata Management Method of Hierarchical Storage System Based on HDFS
LIU Xiaoyu, XIA Libin, JIANG Xiaowei, SUN Gongxing
1.Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
2.University of Chinese Academy of Sciences, Beijing 100049, China
LIU Xiaoyu, XIA Libin, JIANG Xiaowei, SUN Gongxing. Research of Metadata Management Method of Hierarchical Storage System Based on HDFS[J]. Computer Engineering and Applications, 2023, 59(17): 257-265.
[1] 程耀东,石京燕,陈刚.高能物理计算环境概述[J].科研信息化技术与应用,2014,5(3):3-10.
CHENG Y D,SHI J Y,CHEN G.A survey of high energy physics computing system[J].E-science Technology & App-lication,2014,5(3):3-10.
[2] Apache Software Foundation.Apache Hadoop[EB/OL].[2021-12-27].https://hadoop.apache.org/.
[3] ZAHARIA M,CHOWDHURY M,FRANKLIN,et al.Spark:cluster computing with working sets[C]//Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing(HotCloud’10),2010.
[4] 臧冬松,霍菁,梁栋,等.基于MapReduce的高能物理数据分析系统[J].计算机工程,2014,40(2):1-5.
ZANG D S,HUO J,LIANG D,et al.High energy physics data analysis system based on MapReduce[J].Computer Engineering,2014,40(2):1-5.
[5] 雷晓凤,李强,孙功星.基于HBase的高能物理数据存储及分析平台[J].计算机工程,2015,41(6):49-55.
LEI X F,LI Q,SUN G X.HBase-based storage and analysis platform for high energy physics data[J].Computer Engineering,2015,41(6):49-55.
[6] 李强,孙震宇,雷晓凤,等.基于磁盘I/O性能的Hadoop任务选择策略[J].计算机工程,2016,42(11):76-82.
LI Q,SUN Z Y,LEI X F,et al.Hadoop task selection strategy based on disk I/O Performance[J].Computer Engineering,2016,42(11):76-82.
[7] 尹乔,魏占辰,黄秋兰,等.Hadoop海量数据迁移系统开发及应用[J].计算机工程与应用,2019,55(13):66-71.
YIN Q,WEI Z C,HUANG Q L,et al.Development and application of Hadoop massive data migration system[J].Computer Engineering and Applications,2019,55(13):66-71.
[8] 魏占辰,刘晓宇,黄秋兰,等.Spark迭代密集型应用的优化方法研究[J].计算机工程与应用,2020,56(23):68-73.
WEI Z C,LIU X Y,HUANG Q L,et al.Research on optimization for iteration-intensive applications on Spark[J].Computer Engineering and Applications,2020,56(23):68-73.
[9] 李强.面向高能物理数据分析的Hadoop关键技术研究[D].北京:中国科学院大学,2017.
LI Q.Study of Hadoop key technologies for high energy physics data analysis[D].Beijing:University of Chinese Aca-demy of Sciences,2017.
[10] HUANG Q L,WEI Z C,SUN G X,et al.Using hadoop for high energy physics data analysis[C]//International Conference on Big Scientific Data Management(BigSDM 2018),Beijing,China.Cham:Springer,2018.
[11] 陈刚.高能物理实验中的数据与计算技术[J].中国科学:物理学、力学、天文学,2021,51(9):14-23.
CHEN G.Data and computing for high energy physics experiments[J].Scientia Sinica-Physica,Mechanica & Astronomica,2021,51(9):14-23.
[12] 李卫东,石京燕,汪璐,等.高能物理实验的离线计算[J].现代物理知识,2016,28(3):38-45.
LI W D,SHI J Y,WANG L,et al.Off-line calculation for high energy physics experiments[J].Modern Physics,2016,28(3):38-45.
[13] 刘爱贵.分布式文件系统元数据服务模型[EB/OL].[2021-12-20].https://blog.csdn.net/liuaigui/article/details/6749188.
LIU A G.Distributed file system metadata service model[EB/OL].[2021-12-20].https://blog.csdn.net/liuaigui/article/details/6749188.
[14] SHVACHKO K,KUANG H,RADIA S,et al.The hadoop distributed file system[C]//26th Symposium on Mass Storage Systems and Technologies(MSST),Incline Village,NV,USA,2010.
[15] GHEMAVAT S,GOBIOFF H,LEUNG S T.The Google file system[C]//19th ACM Symposium on Operating System Principles(SOSP),New York,NV,USA,2003.
[16] Apache Software Foundation.HDFS federation[EB/OL].[2021-12-26].https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/Federation.html。
[17] Apache Software Foundation.HDFS high availability[EB/OL].[2021-12-26].https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html.
[18] 汪璐.海量存储系统元数据服务设计及优化[D].北京:中国科学院大学,2011.
WANG L.The design and optimization of the metadata service in mass storage system[D].Beijing:University of Chinese Academy of Sciences,2011.
[19] DAI H,WANG Y,KENT K B,et al.The state of the art of metadata managements in large-scale distributed file systems scalability,performance and availability[J].IEEE Transactions on Parallel and Distributed Systems,2022,33(12):3850-3869.
[20] SINGH H J,BAWA S.Scalable metadata management techniques for ultra-large distributed storage systems—a systematic review[J].ACM Computing Surveys,2019,51(4):1-37.
[21] LI J W,HUANG S Y,REN Y J,et al.Enabling secure and space-efficient metadata management in encrypted deduplication[J].IEEE Transactions on Computers,2022,71(4):959-970.
[22] Apache Software Foundation.Enable support for heterogeneous storages in HDFS[EB/OL].[2021-12-26].https://issues.apache.org/jira/browse/HDFS-2832.
[23] Apache Software Foundation.Archival storage,SSD & memory[EB/OL].[2021-12-26].https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html.
[24] KRISH K R,ANWAR A,BUTT A R.hatS:a heterogeneity-aware tiered storage for Hadoop[C]//14th IEEE/ACM International Symposium on Cluster,Cloud and Grid Computing,Chicago,IL,USA,2014.
[25] KAKOULLI E,HERODOTOU H.OctopusFS:a distributed file system with tiered storage management[C]//SIGMOD’17:2017 ACM International Conference on Management of Data,Illinois,Chicago,USA,2017.
[26] ISLAM N S,LU X Y,WASI-UR-RAHMAN M,et al.Triple-H:a hybrid approach to accelerate HDFS on HPC clusters with heterogeneous storage architecture[C]//15th IEEE/ACM International Symposium on Cluster,Cloud and Grid Computing(CCGRID),Shenzhen,China,2015.
[27] CERN.EOSCTA Docs[EB/OL].[2021-12-26].https://eoscta.docs.cern.ch/.
[28] CERN.QuarkDB documentation[EB/OL].[2021-12-26].https://quarkdb.web.cern.ch/quarkdb/docs/0.4.3/.
[29] Google Developers.Protocol buffers[EB/OL].[2021-12-27].https://developers.google.cn/protocol-buffers?hl=zh-cn.
[30] tech2max.com.Mhvtl[EB/OL].[2022-12-02].http://www.mhvtl.com/.
[31] PRESTI G L,BARRING O,EARL A,et al.CASTOR:a distributed storage resource facility for high performance data processing at CERN[C]//24th IEEE Conference on Mass Storage Systems and Technologies(MSST 2007),San Diego,CA,USA,2007.
[32] CANO E,BAHYL V,CAFFY C,et al.CERN tape archive:production status,migration from CASTOR and new features[C]//24th International Conference on Computing in High Energy and Nuclear Physics(CHEP 2019),Adelaide,Australia,2020.
[33] BAUER R.Evaluation of CTA for use at Fermilab[EB/OL].[2022-12-02].https://lss.fnal.gov/archive/2022/slides/fermilab-slides-22-009-scd.pdf.
[34] Apache hadoop.HDFS erasure coding[EB/OL].[2021-12-27].https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html.