Event Level Parallelization Research of BESIII Experimental Software

doi:10.3778/j.issn.1002-8331.2006-0059

Abstract

Abstract:

The job level parallel of BESIII experimental software has the disadvantage of huge memory consumption, the sequence level parallel needs complex sorting work. Aiming at solving these problems, this article puts forward parallel solution at event level, since each event data is independent, data parallelization is selected. Coarse-grained locking of event group provides the best balance between the performance benefits of thread parallelism and the overhead of thread interaction. Creating event group FIFO queue, setting corresponding semaphore for event group state make file output thread, file input thread, event processing threads interact effectively. The mapping table is established to allocate event for event processing threads and update corresponding context. As a result, the data can flow in the original order, avoiding the sorting work. The lazy loading technique is applied to reduce memory waste caused by invalid data. For tuple output of event level parallel, three-layer mapping makes each thread fill the corresponding tree. The experimental results show that event level parallel solution reduces memory consumption 46.5%, the performance improves significantly.

Key words: sort, event group FIFO queue, event group allocation, thread interaction, three layers of mapping

摘要：

针对BESIII实验软件作业级并行内存消耗严重，序列级并行排序过程复杂等弊端，提出事例级并行化的解决方案，因各个事例的数据相互独立，故采用以事例组为单位的粗粒度加锁技术，在线程并行度带来的性能提升和线程交互导致的开销中取得最佳平衡。通过在内存中创建事例组先进先出队列，为事例组空闲、数据就绪、处理完成三种状态设置对应的信号量，使文件输入线程、文件输出线程、事例循环处理线程进行交互，进而建立映射表为事例处理线程分配事例并更新上下文，上述机制保证了事例数据的原序流动，避免了复杂的排序工作；为避免无效数据导致的内存浪费，应用了数据访问延迟加载技术；针对事例级并行的元组输出，建立三层映射，使得每个线程只需填充对应的树即可；最终内存消耗降低46.5%，执行性能获得显著提升。

关键词: 排序, 事例组先进先出队列, 事例组分配, 线程交互, 三层映射

MA Zhentai, ZHANG Xiaomei, SUN Gongxing. Event Level Parallelization Research of BESIII Experimental Software[J]. Computer Engineering and Applications, 2021, 57(20): 253-262.

马震太，张晓梅，孙功星. BESIII实验软件事例级并行化研究[J]. 计算机工程与应用, 2021, 57(20): 253-262.

References

[1] 迟学斌.并行计算与实现技术[M].北京：科学出版社，2015：9-16.
CHI Xuebin.Parallel computing and implementation techniques[M].Beijing：Science Press，2015：9-16.
[2] OpenMP Architecture Review Board.OpenMP application program interface[S].USA：OpenMP Architecture Review Board，2018.
[3] APARICIO G，SALMERóN J M G，CASADO L G，et al.Parallel algorithms for computing the smallest binary tree size in unit simplex refinement[J].Journal of Parallel and Distributed Computing，2018，112（2）：166-178.
[4] BARNEY B.Message passing interface：MPI[EB/OL].[2020-04-10].https：//computing.llnl.gov/tutorils/mpi/.
[5] 张继成，申文杰.基于PVM的并行遗传优化研究[J].计算机光盘软件与应用，2013，1（16）：87-88.
ZHANG Jicheng，SHEN Wenjie.Parallel optimization of genetic based on PVM[J].Computer CD Software and Applications，2013，1（16）：87-88.
[6] TALNIKAR C.A two-level computational graph method for the adjoint of a finite volume based compressible unsteady flow solver[J].Parallel Computing，2018，81：68-84.
[7] 刘颖.异构并行编程模型研究与进展[J].软件学报，2014，25（7）：1459-1475.
LIU Ying.Research on heterogeneous parallel programming model[J].Journal of Software，2014，25（7）：1459-1475.
[8] DEAN J，GHEMAWAT S.MapReduce：simplified data processing on large clusters[J].Communications of the ACM，2008，51（1）：107-113.
[9] 刘志强.基于线程的MPI通信加速器技术研究[J].计算机学报，2011，34（1）：154-164.
LIU Zhiqiang.A study of thread-based MPI communication accelerator[J].Chinese Journal of Computers，2011，34（1）：154-164.
[10] 董小社.面向GPU异构并行系统的多任务流编程模型[J].计算机学报，2014，37（7）：1638-1646.
DONG Xiaoshe.A multi task-stream programing model for GPU based on heterogeneous parallel system[J].Chinese Journal of Computers，2014，37（7）：1638-1646.
[11] NAVARRO C A.A survey on parallel computing and its applications in data-parallel problems using GPU architectures[J].Communications in Computational Physics，2014，15（2）：309-315.
[12] 王蕾.任务并行编程模型研究与进展[J].软件学报，2013，24（1）：77-90.
WANG Lei.Research on task parallel programming model[J].Journal of Software，2013，24（1）：77-90.
[13] Intel.Intel?ThreadingBuilding Blocks[EB/OL].[2020-04-10].
https：//software.intel.com/en-us/tbb-tutorial.
[14] CERN.GaudiLHCb data processing applications framework[EB/OL].[2020-04-10].http：//gaudi.web.cern.ch/gaudi/.
[15] 高德纳.计算机程序设计艺术[M].贾洪峰，译.2版.北京：人民邮电出版社，2017：190-250.
KNUTH D E.The art of computer programming[M].JIA Honfeng.2nd ed.Beijing：POSTS & TELECOM PRESS，2017：190-250.
[16] 李士刚.异构多核上多级并行模型支持及性能优化[J].软件学报，2013，24（12）：2782-2796.
LI Shigang.Support for multi-level parallelism on heterogeneous multi-core and performance optimizatio[J].Journal of Software，2013，24（12）：2782-2796.
[17] BOVET D P，CESATI M.Understanding the Linux kernel[M].3rd ed.Nanjing：Southeast University Press，2019：100-160.
[18] MA M M.The offline data quality monitoring of the BESIII end cap TOF system[J].Radiation Detection Technology and Methods，2019，3（4）：1-7.
[19] LI Yang，DAI Zhitao.Design and implementation of hardware accelerator for recommendation system based on heterogeneous computing platform[C]//3rd International Conference on Mechatronics Engineering and Information Technology，Dalian.Paris：Atlantis Press，2019：966-971.
[20] JIANG Xu，GUAN Nan，LONG Xiang，et al.Real-time scheduling of parallel tasks with tight deadlines[J].Journal of Systems Architecture，2020，108：1-10.