Computer Engineering and Applications ›› 2013, Vol. 49 ›› Issue (8): 37-42.

Previous Articles     Next Articles

Optimization of FMM’s short range calculation with multi-GPU architecture

CAO Min, TIAN Li, ZHU Yonghua   

  1. School of Computer Engineering and Science, Shanghai University, Shanghai 200072, China
  • Online:2013-04-15 Published:2013-04-15

多GPU混合结构下FMM近程算法的优化

曹  旻,田  力,朱永华   

  1. 上海大学 计算机工程与科学学院,上海 200072

Abstract: Recent years, the hybrid architecture of GPU and CPU has become the main architecture of high performance computer.Considering the specificity of hybrid architecture, this paper analyzes traditional Amdahl’s law, and extends the Amdahl’s law to hybrid architecture. Under the guidance of Amdahl’s law, a multiple GPU scheduling model and two-level pipelining model are presented to balance the workload of each GPU and reduce the communication latency, which are two main problems in the short range calculation of FMM algorithm. The scheduling model can effectively balance workload of each GPU and relieve the affect caused by the non-uniform short range calculation. The two-level pipelining model enables CPU and GPU to work in parallel, so it compensates the memory access latency and improves the utilization rate. Experimental results prove that the presented methods are feasible and can speed up the algorithm.

Key words: hybrid architecture, GPU, Fast Multipole Method(FMM), PetFMM, pipelining

摘要: 近几年,在高性能计算领域,GPU+CPU混合结构成为许多高性能计算机的主要结构,得到了广泛的应用。由于混合结构的特殊性,分析了传统的阿姆达尔定律,将其推广到混合结构中。针对FMM算法中近程计算部分在multi-GPU+CPU混合结构中存在的任务均衡以及通信延时等问题,在混合结构阿姆达尔定律的指导下,提出了多GPU调度模型和两级流水模型。该调度模型能够有效地进行多个GPU之间负载的均衡,缓解近程计算的非均匀性所带来的问题;同时,两级流水模型使CPU和GPU可以并行工作,通过计算和访存的重叠,来隐藏访存带来的延时问题,提高运算部件的利用率。实验验证和数据的比较证明了上述优化的可行性,该优化方案进一步加速了算法的执行。

关键词: 混合结构, GPU, 快速多极子算法(FMM), PetFMM, 流水线