Optimization of FMM’s short range calculation with multi-GPU architecture

Abstract

Abstract: Recent years, the hybrid architecture of GPU and CPU has become the main architecture of high performance computer.Considering the specificity of hybrid architecture, this paper analyzes traditional Amdahl’s law, and extends the Amdahl’s law to hybrid architecture. Under the guidance of Amdahl’s law, a multiple GPU scheduling model and two-level pipelining model are presented to balance the workload of each GPU and reduce the communication latency, which are two main problems in the short range calculation of FMM algorithm. The scheduling model can effectively balance workload of each GPU and relieve the affect caused by the non-uniform short range calculation. The two-level pipelining model enables CPU and GPU to work in parallel, so it compensates the memory access latency and improves the utilization rate. Experimental results prove that the presented methods are feasible and can speed up the algorithm.

Key words: hybrid architecture, GPU, Fast Multipole Method（FMM）, PetFMM, pipelining

摘要： 近几年，在高性能计算领域，GPU+CPU混合结构成为许多高性能计算机的主要结构，得到了广泛的应用。由于混合结构的特殊性，分析了传统的阿姆达尔定律，将其推广到混合结构中。针对FMM算法中近程计算部分在multi-GPU+CPU混合结构中存在的任务均衡以及通信延时等问题，在混合结构阿姆达尔定律的指导下，提出了多GPU调度模型和两级流水模型。该调度模型能够有效地进行多个GPU之间负载的均衡，缓解近程计算的非均匀性所带来的问题；同时，两级流水模型使CPU和GPU可以并行工作，通过计算和访存的重叠，来隐藏访存带来的延时问题，提高运算部件的利用率。实验验证和数据的比较证明了上述优化的可行性，该优化方案进一步加速了算法的执行。

关键词: 混合结构, GPU, 快速多极子算法（FMM）, PetFMM, 流水线

CAO Min, TIAN Li, ZHU Yonghua. Optimization of FMM’s short range calculation with multi-GPU architecture[J]. Computer Engineering and Applications, 2013, 49(8): 37-42.

曹旻，田力，朱永华. 多GPU混合结构下FMM近程算法的优化[J]. 计算机工程与应用, 2013, 49(8): 37-42.

[1]	ZHANG Yu, ZHANG Yansong. Research on Vector Grouping Aggregation Technology [J]. Computer Engineering and Applications, 2021, 57(11): 84-94.
[2]	YANG Jie, WU Suping. Parallel Algorithm for Point Cloud Reconstruction [J]. Computer Engineering and Applications, 2020, 56(6): 213-219.
[3]	DU Wei, FU You. GPU-Based Least Squares Monte Carlo Algorithm Option Pricing [J]. Computer Engineering and Applications, 2020, 56(4): 225-229.
[4]	ZHU Chao, WU Suping. Feature Point Detection DoG Parallel Algorithm [J]. Computer Engineering and Applications, 2020, 56(10): 36-43.
[5]	GUO Mingang, GONG He. Optimization of Convolutional Neural Network Based on Tensorflow [J]. Computer Engineering and Applications, 2020, 56(1): 158-164.
[6]	ZHANG Bin, ZHANG Zhengqiang, WANG Hongkai. Research on Cone-Beam CT Reconstruction Algorithm Based on GPU Acceleration [J]. Computer Engineering and Applications, 2019, 55(4): 208-213.
[7]	QIN Jinbo1, ZENG Zhiqiang1，2, LIANG Ji1, YANG Mingxiang2, ZHANG Jian1. Review of application GPU technology in hydraulic parallel optimization calculation [J]. Computer Engineering and Applications, 2018, 54(3): 23-29.
[8]	CHEN Yufeng, ZHANG Bo, LI Lin. Research of high-efficient volume rendering algorithm for electromagnetic field based on multi-core CPU+GPU parallel computing [J]. Computer Engineering and Applications, 2018, 54(18): 218-222.
[9]	CHENG Huan. Method of marine radar echo simulation based on DEM image enhancement [J]. Computer Engineering and Applications, 2018, 54(10): 186-191.
[10]	FAN Yuling1, WANG Meili1, HE Dongjian2. Method of point cloud segmentation based on OpenCL [J]. Computer Engineering and Applications, 2018, 54(1): 191-195.
[11]	DING Xiangwu, CHEN Jinxin, WANG Mei. Optimizing parallel join of column-stores on heterogeneous computing platform [J]. Computer Engineering and Applications, 2017, 53(5): 73-80.
[12]	ZHANG Rengao1, ZHENG Qilong1, WANG Xiangqian2, HAN Dongke1. Improved software pipelining framework of dependency circle [J]. Computer Engineering and Applications, 2017, 53(17): 65-69.
[13]	LIU Bin1, HE Jinrong1, GENG Yaojun1, WANG Zui2. Recent advances in infrastructure architecture of parallel machine learning algorithms [J]. Computer Engineering and Applications, 2017, 53(11): 31-38.
[14]	LU Min1，2, WANG Jinyin1，2, LU Gang3, TAO Weidong1，2, WANG Jiechen1，2. Research of raster data spatial analysis under CPU/GPU heterogeneous hybrid parallel environment—Take terrain factors analysis as an example [J]. Computer Engineering and Applications, 2017, 53(1): 172-177.
[15]	LIU Gaoming1，2, RONG Kui2, ZHU Hui1, TAN Huailiang2. Design of graphics display system for embedded RTOS [J]. Computer Engineering and Applications, 2016, 52(9): 190-195.

Optimization of FMM’s short range calculation with multi-GPU architecture

多GPU混合结构下FMM近程算法的优化

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics