计算机工程与应用 ›› 2014, Vol. 50 ›› Issue (3): 22-29.

• 博士论坛 • 上一篇    下一篇

基于OpenCL的累积汇流并行计算

龙满生,罗文浪   

  1. 井冈山大学 电子与信息工程学院,江西 吉安 343009
  • 出版日期:2014-02-01 发布日期:2014-01-26

Parallel computing with OpenCL for flow accumulation

LONG Mansheng, LUO Wenlang   

  1. School of Electronics and Information Engineering, Jinggangshan University, Ji’an, Jiangxi 343009, China
  • Online:2014-02-01 Published:2014-01-26

摘要: 大尺度、高分辨率数字地形数据应用需求的增长,给计算密集型的累积汇流等数字地形分析算法带来了新的挑战。针对CPU/GPU(Graphics Processing Unit)异构计算平台的特点,提出了一种基于OpenCL(Open Computing Language)的多流向累积汇流算法的并行化策略,具有更好的平台独立性和可移植性,简化了CPU/GPU异构平台下的并行应用程序设计。累积汇流并行算法包括时空独立型的流量分配和空间依赖型的累积入流两个过程,均定义为OpenCL内核并交由OpenCL设备并行执行,其中累积入流过程借助流量转移矩阵由递归式转换为迭代式来实现并行计算。与基于流量转移矩阵的并行汇流算法相比,尽管基于单元入度矩阵的并行汇流算法可以降低迭代过程中的计算冗余,但需要采用具有较大延迟的原子操作以及需要更多的迭代次数,在有限的GPU计算资源下,两种算法性能差异不明显。实验结果表明,并行累积汇流算法在NVIDIA GeForce GT 650M GPU上获得了较好的加速比,加速性能随格网尺度增加而有所增加,其中流量分配获得了约50~70倍的加速比,累积入流获得了10~20倍的加速比,展示了利用OpenCL在GPU等并行计算设备上进行大规模数字地形分析的潜在优势。

关键词: 并行计算, 累积汇流, 图形处理器, 开放计算语言

Abstract: The growing demand for the applications of large scale and high resolution digital terrain data has brought new challenges to computationally intensive digital terrain analysis algorithms such as flow accumulation. According to the characteristics of heterogeneous computing platform with CPU/GPU(Graphics Processing Unit), a parallelization strategy for multiple flow direction flow accumulation algorithm is put forward based on the OpenCL(Open Computing Language). It has better platform independence and portability, which simplifies the programming for parallel computing under CPU/GPU heterogeneous platform. The parallel flow accumulation algorithm includes outflow allocation process independently with the space and time domain, and the inflow accumulation process depending on the space domain. The two processes are defined as OpenCL kernels and are executed parallelly on the OpenCL devices. The transfer matrix is used to transfer the recursive inflow accumulation process into iterative style for parallel computing. Compared with the parallel flow accumulation algorithm based on flow transfer matrix, the parallel flow accumulation algorithm based on indegree matrix with graph theory can reduce the computation redundancy in the iterative inflow accumulation process, but it requires atomic operations with large delay and more iterations. With limited GPU computing resources, the two parallel flow accumulation algorithms have no obvious differences in speedup performance. Experimental results show that the parallel flow accumulation algorithm obtains a good speedup on NVIDIA GeForce GT 650M GPU and the speedup is increased gradually with the increase of grid scale. The speedups are 50~70 for the outflow allocation process and 10~20 for the inflow accumulation process, which demonstrates the potential advantages of large scale digital terrain analysis on parallel computing devices such as GPU with OpenCL.

Key words: parallel computing, flow accumulation, Graphics Processing Unit(GPU), Open Computing Language(OpenCL)