Computer Engineering and Applications ›› 2020, Vol. 56 ›› Issue (21): 60-64.DOI: 10.3778/j.issn.1002-8331.1912-0278

Previous Articles     Next Articles

High Energy Physics Data Placement Strategy Based on Random Forest

CHENG Zhenjing, CHENG Yaodong, CHEN Gang, WANG Lu, LI Haibo, HU Qingbao   

  1. 1.Computing Center, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
    2.University of Chinese Academy of Sciences, Beijing 100049, China
    3.Tianfu Cosmic Ray Research Center, Institute of High Energy Physics, Chinese Academy of Sciences, Chengdu 610041, China
  • Online:2020-11-01 Published:2020-11-03



  1. 1.中国科学院 高能物理研究所,北京100049
    3.中国科学院 高能物理研究所 天府宇宙线研究中心,成都 610041


With the continuous developments of high energy physics experiments such as Large High Air Altitude Shower Observatory(LHAASO), a large amount of data at PB scale will be collected, stored and analyzed every year. At present, random data placement strategy which doesn’t fully consider the differences among data access scenarios, servers and storage devices is generally used. A data placement strategy based on random-forest algorithm is proposed. Storage devices are separated into storage pools(Fast pool, Normal pool) according to their performance. The algorithm will predict and identify a new file’s access pattern, and find one best place for it considering the load of target devices. This paper evaluates the performance of the algorithm with data samples collected from production storage system of LHAASO experiment.

Key words: random forest, distributed storage system, heterogeneous storage, storage pool, data placement strategy, access scenario



关键词: 随机森林, 分布式存储系统, 异构存储, 存储池, 数据放置策略, 访问场景