Computer Engineering and Applications ›› 2017, Vol. 53 ›› Issue (8): 8-14.DOI: 10.3778/j.issn.1002-8331.1610-0368

Previous Articles     Next Articles

Elastic computing resource management mechanism for high-energy physics cloud platform

CHENG Zhenjing1,2, LI Haibo1, HUANG Qiulan1, CHENG Yaodong1, CHEN Gang1   

  1. 1.Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
    2.University of Chinese Academy of Sciences, Beijing 100049, China
  • Online:2017-04-15 Published:2017-04-28

高能物理云平台中的弹性计算资源管理机制

程振京1,2,李海波1,黄秋兰1,程耀东1,陈  刚1   

  1. 1.中国科学院 高能物理研究所,北京 100049
    2.中国科学院大学,北京 100049

Abstract: As a new resource management technology, virtualization technology is more and more widely used in the field of high-energy physics. Static virtual machine cluster mode has been unable to meet dynamic demand for computing resource of multi-job queues. To solve this problem, an elastic computing resource management system under cloud computing environment has been designed and implemented. The high throughput computing system-HTCondor is used to run high-energy physics jobs and the cloud computing platform-Openstack is used to manage virtual computing nodes. An elastic resource management algorithm based on dual thresholds is proposed, combined with resource quota service. A two-
stage pool is designed to improve the efficiency of resource pool expansion. At present, the system has been deployed in IHEPCloud. The practical run results show that with the changes of resource demand, the system adjusts the number of virtual computing nodes dynamically. CPU utilization of the cluster is significantly increased as well.

Key words: elastic computing resource management, virtual computing cluster, high-energy physics computing, dynamic schedule, HTCondor, Openstack, resource utilization

摘要: 虚拟化技术作为一种新的资源管理技术,正在高能物理领域得到越来越广泛的应用。静态虚拟机集群方式已经逐渐不能满足多作业队列对于计算资源动态的需求。为此,实现了一种云计算环境下面向多作业队列的弹性计算资源管理系统。系统通过高吞吐量计算系统HTCondor运行计算作业,使用开源的云计算平台Openstack管理虚拟计算节点,给出了一种结合虚拟资源配额服务,基于双阈值的弹性资源管理算法,实现资源池整体伸缩,同时设计了二级缓冲池以提高伸缩效率。目前系统已部署在高能所公共服务云IHEPCloud上,实际运行结果表明,当计算资源需求变化时系统能够动态调整各队列虚拟计算节点数量,同时计算资源的CPU利用率相比传统的资源管理方式有显著的提高。

关键词: 弹性计算资源管理, 虚拟计算集群, 高能物理计算, 动态调度, HTCondor, Openstack, 资源利用率