计算机工程与应用 ›› 2013, Vol. 49 ›› Issue (23): 31-34.

• 理论研究、研发设计 • 上一篇    下一篇

基于CUDA的BP算法并行化与实例验证

孙香玉,冯百明,杨鹏斐   

  1. 1.西北师范大学 计算机科学与工程学院,兰州 730070
    2.中国科学院 计算技术研究所 计算机体系结构国家重点实验室,北京 100190
  • 出版日期:2013-12-01 发布日期:2016-06-12

Parallelization of BP algorithm and example verification based on CUDA

SUN Xiangyu, FENG Baiming, YANG Pengfei   

  1. 1.College of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China
    2.State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
  • Online:2013-12-01 Published:2016-06-12

摘要: CUDA是应用较广的GPU通用计算模型,BP算法是目前应用最广泛的神经网络模型之一。提出了用CUDA模型并行化BP算法的方法。用该方法训练BP神经网络,训练开始前将数据传到GPU,训练开始后计算隐含层和输出层的输入输出和误差,更新权重和偏倚的过程都在GPU上实现。将该方法用于手写数字图片训练实验,与在四核CPU上的训练相比,加速比为6.12~8.17。分别用在CPU和GPU上训练得到的结果识别相同的测试集图片,GPU上的训练结果对图片的识别率比CPU上的高0.05%~0.22%。

关键词: 向后传播算法, 并行化, 计算统一设备架构, 手写数字训练

Abstract: CUDA is a generally used GPGPU(General Purpose Computing on GPU) model. BP algorithm is one of the most widely used neural network model at present. A method of parallelizing BP algorithm using CUDA is proposed in this paper. When this method are used to train BP neural network, data are transferred to GPU before training. Process of computing inputs, outputs, errors of hidden layer and output layer and updating weights, biases are realized on GPU. Training handwritten digital images with this method has speed-up ratio between 6.12 and 8.17 compared to training on four cores CPU. When this two results are respectively used to recognize the same test set, the recognition rate based on training result on GPU increases 0.05% ~0.22% compared to that of CPU.

Key words: Back-Propagation(BP) algorithm, parallelization, Compute United Device Architecture(CUDA), handwritten digits training