Novel automatic mapping technology on CPU-GPU heterogeneous systems

Computer Engineering and Applications ›› 2015, Vol. 51 ›› Issue (21): 41-47.

Previous Articles Next Articles

Novel automatic mapping technology on CPU-GPU heterogeneous systems

ZHU Zhengdong1, LIU Yuan1, ＷEI Hongchang1, YAN Kang1, WANG Yinfeng2, DONG Xiaoshe1

1.School of Electronic & Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China
2.Shenzhen Institute of Information Technology, Shenzhen, Guangdong 518172, China

Online:2015-11-01 Published:2015-11-16

面向CPU-GPU架构的源到源自动映射方法

朱正东1，刘袁1，魏洪昌１，颜康1，王寅峰2，董小社1

1.西安交通大学电子与信息工程学院，西安 710049
2.深圳信息职业技术学院，广东深圳 518172

Abstract

Abstract: Aiming at the developing and porting difficulties of GPU-based applications, a mapping approach is proposed, which converts serial computing source code into equivalent parallel computing source code. This approach acquires hierarchies of parallelizable loops from serial sources, establishes the correspondence between loop structures and GPU threads, and generates the core function code for GPU. Meanwhile, CPU control code is generated according to read/write attributes of variable references. A compiler prototype is implemented based on this approach, which translates C code into CUDA code automatically. Functionality and performance evaluations of the prototype show that the CUDA code generated is functionally equivalent to the original C code, with significant improvement in performance, thus ?overcomes the difficulty in porting compute-intensive applications to CPU-GPU heterogeneous systems.

Key words: General Purpose Graphic Processing Unit（GPGPU）, Compute Unified Device Architecture（CUDA）, automatic mapping, source to source compile

摘要： 针对GPU上应用开发移植困难的问题，提出了一种串行计算源程序到并行计算源程序的映射方法。该方法从串行源程序中获得可并行化循环的层次信息，建立循环体结构与GPU线程的对应关系，生成GPU端核心函数代码；根据变量引用读写属性生成CPU端控制代码。基于该方法实现了一个编译原型系统，完成了C语言源程序到CUDA源程序的自动生成。对原型系统在功能和性能方面的测试结果表明，该系统生成的CUDA源程序与C语言源程序在功能上一致，其性能有显著提高，在一定程度上解决了计算密集型应用向CPU-GPU异构多核系统移植困难的问题。

关键词: 通用计算图形处理器（GPGPU）, 统一计算架构（CUDA）, 自动映射, 源到源编译

ZHU Zhengdong1, LIU Yuan1, ＷEI Hongchang1, YAN Kang1, WANG Yinfeng2, DONG Xiaoshe1. Novel automatic mapping technology on CPU-GPU heterogeneous systems[J]. Computer Engineering and Applications, 2015, 51(21): 41-47.

朱正东1，刘袁1，魏洪昌１，颜康1，王寅峰2，董小社1. 面向CPU-GPU架构的源到源自动映射方法[J]. 计算机工程与应用, 2015, 51(21): 41-47.

[1]	WEI Hongchang, ZHU Zhengdong, DONG Xiaoshe, NING Jie. Asymptotic fitting optimization technology for source-to-source compile system on CPU-GPU architecture [J]. Computer Engineering and Applications, 2016, 52(21): 30-35.
[2]	LI Zhengfu1，2, WANG Xicheng3, LI Keqiu1, YAO Xiang3, DONG Yueli2. Information entropy multi-population genetic algorithm based on CUDA [J]. Computer Engineering and Applications, 2016, 52(1): 12-16.
[3]	ZHONG Jiyuan, MEI Kuizhi, WEN Zhexi. Parallel stream computing implementation of GIST algorithm on heterogeneous platform [J]. Computer Engineering and Applications, 2015, 51(6): 139-144.
[4]	DONG Lili, DONG Wei, ZHANG Xiang. Research for memory data clustering efficiency with CUDA [J]. Computer Engineering and Applications, 2015, 51(22): 243-251.
[5]	LIN Min, ZHONG Yiwen. Three GPU-based parallel simulated annealing algorithm with adaptive neighborhood [J]. Computer Engineering and Applications, 2015, 51(22): 70-76.
[6]	WANG Hong1，2, WANG Peng1，2. Improved AC pattern matching algorithm based on GPU [J]. Computer Engineering and Applications, 2015, 51(18): 7-12.
[7]	ZENG Bo1，2, LEI Youcheng1, WANG Congzhi2, QIU Weibao2, FENG Ge2, ZENG Chengzhi2, YANG Ge2, ZHENG Hairong2. CUDA-based acoustic radiation force imaging algorithm [J]. Computer Engineering and Applications, 2015, 51(18): 249-254.
[8]	XU Liang1, TIAN Zheng2, WANG Zhen2. Parallel stereo matching using variable window based disparity refinement [J]. Computer Engineering and Applications, 2015, 51(15): 193-197.
[9]	ZHOU Bingyuan1, CHEN Qingkui1，2, GAO Liping1, QIN Chuan1，2. Image matching algorithm based on CUDA [J]. Computer Engineering and Applications, 2015, 51(12): 165-170.
[10]	ZHAO Jiawei, FANG Jiuling, SU Ming. Fast implementation of Legendre sequence generation by CUDA [J]. Computer Engineering and Applications, 2014, 50(8): 66-71.
[11]	CHEN Hua, SHI Yuerong. Study on restarted PGMRES parallel algorithm with GPU [J]. Computer Engineering and Applications, 2014, 50(7): 35-40.
[12]	TANG Shaohua. Parallelizing network coding on manycore GPU-accelerated system with optimization [J]. Computer Engineering and Applications, 2014, 50(21): 79-84.
[13]	LIU Jinfeng. Comparation of several CUDA accelerated Gaussian filtering algorithms [J]. Computer Engineering and Applications, 2013, 49(23): 14-18.
[14]	XU Liang1, WANG Zhen2. Fast large integer multiplication based on CUDA [J]. Computer Engineering and Applications, 2013, 49(16): 221-224.
[15]	ZHANG Xuezhi1, QI Ji2, LIN Ping2. GPU programming and acceleration of Laplace growth model [J]. Computer Engineering and Applications, 2012, 48(22): 84-87.

Novel automatic mapping technology on CPU-GPU heterogeneous systems

面向CPU-GPU架构的源到源自动映射方法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics