计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (23): 230-237.DOI: 10.3778/j.issn.1002-8331.1708-0220

• 工程与应用 • 上一篇    下一篇

高能物理事例级数据管理与传输系统的研究

王  聪1,2,徐  琪1,2,程耀东1,陈  刚1   

  1. 1.中国科学院 高能物理研究所,北京 100049
    2.中国科学院大学,北京 100049
  • 出版日期:2018-12-01 发布日期:2018-11-30

Research of event data management and data transfer system in high energy physics

WANG Cong1,2, XU Qi1,2, CHENG Yaodong1, CHEN Gang1   

  1. 1.Institute of High Energy Physics , Chinese Academy of Sciences, Beijing 100049, China
    2.University of Chinese Academy of Sciences, Beijing 100049, China
  • Online:2018-12-01 Published:2018-11-30

摘要: 高能物理实验不断的进步与发展产生了PB乃至EB级的数据,这些数据的采集、存储、传输与共享、分析与管理都面临着极大的问题与挑战。为了应对这些挑战,设计和实现了面向事例的数据管理系统,有效解决事例数据处理效率低以及分站点资源利用率低的问题。设计了一个基于Nosql数据库的事例索引系统,通过事例数据特征抽取,选取物理学家最感兴趣的属性作为索引,存储在数据库中,并采用倒排索引技术,提高事例数据检索的效率。针对事例数据进行缓存优化,减少数据转化和存储开销。提出数据跨域传输方案,充分利用网络带宽,降低分站点处理数据的延迟。系统进行了相关验证,实验结果表明,事例级的索引技术能够显著提高事例数据的检索效率,数据传输系统的网络带宽也可以利用到百分之九十以上。

关键词: 高能物理, 数据管理, 数据传输, 性能优化, 多流传输

Abstract: High-energy physics experiments continue to progress and develop, they have produced PB even EB-level of data. The data acquisition, storage, transmission, sharing, analysis and management are facing great problems and challenges. In order to meet these challenges, this paper designs and implements an event-oriented data management system, which effectively solves the problem of low efficiency of data processing and low utilization of resources. Firstly, an indexing system based on Nosql database is designed. By extracting specified properties of events, the most interesting attributes of the physicists are selected and are stored in the database. The inverted indexing technology is adopted to improve the efficiency of event retrieval. Then, it caches the sample data for optimization, reducing data conversion and storage overhead. Also, the data cross-domain transmission scheme is proposed, which makes full use of the network bandwidth and reduces the delay of processing the data. The data transfer system based on a web framework and asynchronous networking library is called tornado. By using non-blocking network I/O, Tornado can scale to tens of thousands of open connections, making it ideal for long polling. The experimental results show that the event-level indexing technology can significantly improve the retrieval efficiency of the event data , and the network bandwidth of the data transmission system can be utilized more than 90%.

Key words: high energy physics, data management, data transfer, performance optimization, multi-flow transmission