Parallelization of BP algorithm and example verification based on CUDA

Computer Engineering and Applications ›› 2013, Vol. 49 ›› Issue (23): 31-34.

Previous Articles Next Articles

Parallelization of BP algorithm and example verification based on CUDA

SUN Xiangyu, FENG Baiming, YANG Pengfei

1.College of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China
2.State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China

Online:2013-12-01 Published:2016-06-12

基于CUDA的BP算法并行化与实例验证

孙香玉，冯百明，杨鹏斐

1.西北师范大学计算机科学与工程学院，兰州 730070
2.中国科学院计算技术研究所计算机体系结构国家重点实验室，北京 100190

Abstract

Abstract: CUDA is a generally used GPGPU（General Purpose Computing on GPU） model. BP algorithm is one of the most widely used neural network model at present. A method of parallelizing BP algorithm using CUDA is proposed in this paper. When this method are used to train BP neural network, data are transferred to GPU before training. Process of computing inputs, outputs, errors of hidden layer and output layer and updating weights, biases are realized on GPU. Training handwritten digital images with this method has speed-up ratio between 6.12 and 8.17 compared to training on four cores CPU. When this two results are respectively used to recognize the same test set, the recognition rate based on training result on GPU increases 0.05% ~0.22% compared to that of CPU.

Key words: Back-Propagation（BP） algorithm, parallelization, Compute United Device Architecture（CUDA）, handwritten digits training

摘要： CUDA是应用较广的GPU通用计算模型，BP算法是目前应用最广泛的神经网络模型之一。提出了用CUDA模型并行化BP算法的方法。用该方法训练BP神经网络，训练开始前将数据传到GPU，训练开始后计算隐含层和输出层的输入输出和误差，更新权重和偏倚的过程都在GPU上实现。将该方法用于手写数字图片训练实验，与在四核CPU上的训练相比，加速比为6.12~8.17。分别用在CPU和GPU上训练得到的结果识别相同的测试集图片，GPU上的训练结果对图片的识别率比CPU上的高0.05%~0.22%。

关键词: 向后传播算法, 并行化, 计算统一设备架构, 手写数字训练

SUN Xiangyu, FENG Baiming, YANG Pengfei. Parallelization of BP algorithm and example verification based on CUDA[J]. Computer Engineering and Applications, 2013, 49(23): 31-34.

孙香玉，冯百明，杨鹏斐. 基于CUDA的BP算法并行化与实例验证[J]. 计算机工程与应用, 2013, 49(23): 31-34.

[1]	SHI Jieyuan, YUAN Zhiyong, LIAO Xiangyun, ZHAO Jianhui. Multirate Systematic Framework for Magnetic Levitation Visuo-Haptic Interaction [J]. Computer Engineering and Applications, 2021, 57(5): 197-203.
[2]	LI Jian, ZHANG Dawei, JIANG Xiaoming, XIANG Liyun. Review on Parallelized Flood Inundation Models [J]. Computer Engineering and Applications, 2021, 57(13): 1-7.
[3]	YANG Hang, ZHU Yongli. Ensemble Empirical Mode Decomposition of Partial Discharge Signal Based on Storm [J]. Computer Engineering and Applications, 2020, 56(10): 261-267.
[4]	CAO Guogang, ZHANG Qing, ZHANG Peijun, WANG Zhimin. Multicore-based parallelized differential evolution for image registration [J]. Computer Engineering and Applications, 2017, 53(20): 166-172.
[5]	LI Ming, LI Tianrui, CHEN Zhi, YANG Yan. Empirical mode decomposition of high-speed rail data based on Spark computing framework [J]. Computer Engineering and Applications, 2016, 52(20): 103-107.
[6]	DONG Benzhi, LI Wenhao, JING Weipeng. Parallel implementation for Last based on Hadoop Streaming [J]. Computer Engineering and Applications, 2014, 50(2): 226-230.
[7]	HUO Aiqing1, WANG Yuelong1, TANG Nan1, CHENG Weibin1, GE Lei2. BP algorithm based on partial counter propagation network and its application [J]. Computer Engineering and Applications, 2012, 48(4): 211-214.
[8]	MIAO Chunbao, ZHAO Peng, SHEN Biao, LIU Yongling. Parallelization of numerical storm surge model [J]. Computer Engineering and Applications, 2012, 48(2): 39-42.
[9]	ZHU Yao，YAN Chenghua，LI Qiang. Research of NTRU encryption and decryption algorithm with GPU [J]. Computer Engineering and Applications, 2011, 47(34): 81-85.
[10]	ZHANG Cong-pin，WU Chang-mao，ZHAO Li-li. Research on parallel implementation of LISP2 algorithm on CUDA platform [J]. Computer Engineering and Applications, 2010, 46(33): 75-77.
[11]	LI Xiao-zhong^1，2，LI Qiu²，ZHANG You-wei². New kind of fuzzy neural network model [J]. Computer Engineering and Applications, 2010, 46(16): 60-62.

Parallelization of BP algorithm and example verification based on CUDA

基于CUDA的BP算法并行化与实例验证

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 11

Recommended Articles

Metrics