计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (15): 253-263.DOI: 10.3778/j.issn.1002-8331.2206-0067

• 大数据与云计算 • 上一篇    下一篇

微服务架构磁带库存储系统设计与实现

刘晓宇,夏立斌,姜晓巍,孙功星   

  1. 1.中国科学院 高能物理研究所,北京 100049
    2.中国科学院大学,北京 100049
  • 出版日期:2023-08-01 发布日期:2023-08-01

Design and Implementation of Tape Library Storage System Based on Microservice Architecture

LIU Xiaoyu, XIA Libin, JIANG Xiaowei, SUN Gongxing   

  1. 1.Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
    2.University of Chinese Academy of Sciences, Beijing 100049, China
  • Online:2023-08-01 Published:2023-08-01

摘要: 建立具有磁带存储层的HDFS分级存储系统是完善高能物理领域Hadoop生态系统的重要部分,但高能物理领域传统的磁带存储管理系统(如Castor、CTA)上层不支持HDFS磁盘存储,并且随着高能物理数据量的急剧增长、互联网技术的不断发展和用户需求的迅猛变化,传统的磁带存储管理系统逐渐呈现出系统扩展、负载均衡、开发和运维成本上升等方面的问题。设计开发了基于微服务架构的磁带存储管理系统。该系统向上支持HDFS磁盘存储,将磁带库资源管理、文件传输、磁带读写等功能以微服务的形式分布到各个服务实例中,达到分散服务压力的目的,并且系统针对传统负载算法效率不佳的问题,实现了基于服务器响应指数的负载均衡算法,通过根据自定义参数计算得到的服务器响应指数对其进行排序,保证将用户请求调度到响应指数最高的服务器进行处理。实验结果表明,磁带库存储系统满足HDFS文件分级存储磁带层管理的需求,提出的基于服务器响应指数的负载均衡算法相较于轮询算法,系统归档性能高出6%以上,提取性能高出64%以上;相较于随机算法,系统归档性能高出9%以上,提取性能高出64%以上,最终实现的磁带库存储系统表明,与传统的系统相比,微服务体系结构能够实现系统中组件的解耦和分布式负载的平衡,在系统开发和运维等方面更为便捷。

关键词: 磁带库存储, 微服务架构, 负载均衡, 响应指数计算模型, HDFS系统

Abstract: The establishment of HDFS tiered storage system with tape storage layer is an important part of improving Hadoop ecosystem in the field of high energy physics(HEP). However, the traditional tape storage management system in HEP(such as Castor and CTA) doesn’t support HDFS. In addition, with the rapid growth of the amount of HEP data, the continuous development of current Internet technology and rapid changes in user needs, the traditional tape storage management system gradually presents the problem such as system expansion, load balancing, development and maintenance cost increases sharply. This paper designs and develops a tape storage management system based on microservice architecture. The system supports HDFS disk storage and distributes tape library resource management, file transfer, tape read/write and so on functions to each service instance in the form of microservices to disperse service pressure. Furthermore, aiming at the problem of low efficiency of traditional load algorithm, the system proposes and implements a load balancing algorithm based on server response factor. It sorts severs according to the server response factor calculated by user-defined parameters to ensure that user requests are scheduled to the server with the highest response factor for processing. ?Experimental results show that the tape library storage system in this paper can meet the requirements of HDFS file hierarchical storage. Compared with polling algorithm, the system performance of archiving based on server response factor is more than 6% better, and that of extraction is more than 64% better. ?Compared with random algorithm, archiving performance is more than 9% better, and extraction performance is more than 64% better. The application results of the system indicates that compared with the traditional system, the microservice architecture can decouple components and balance distributed loads in the system, and is more convenient in system development and maintenance.

Key words: tape library storage, microservice architecture, load balancing, response index calculation model, HDFS system