Computer Engineering and Applications ›› 2017, Vol. 53 ›› Issue (8): 1-7.DOI: 10.3778/j.issn.1002-8331.1610-0244

Previous Articles     Next Articles

Research on parallel strategy of convolution neural network in distributed environment

ZHANG Renqi, LI Jianhua, FAN Lei   

  1. School of Electronic Information and Electrical Engineering, Shanghai Jiaotong University, Shanghai 200240, China
  • Online:2017-04-15 Published:2017-04-28

分布式环境下卷积神经网络并行策略研究

张任其,李建华,范  磊   

  1. 上海交通大学 电子信息与电气工程学院,上海 200240

Abstract: Convolutional neural networks usually use standard error back propagation algorithm to do serial training. With the growth of data size, the single machine serial training is time-consuming and takes up more system resources. In order to realize the convolution neural network training of massive data, a parallel training model of BP neural network based on MapReduce framework is proposed. The model combines the standard error back-propagation algorithm and error back-propagation algorithm and divides large data sets into several sub sets. Parallel processing is carried out in the condition of loss of a small amount of accuracy, and the MNIST data set is extended to carry out the image recognition test. Experimental results show that the algorithm has a good adaptability to the data size, and can improve the training efficiency of the convolution neural network.

Key words: convolutional neural networks, Back Propagation(BP) algorithm, Hadoop parallel processing

摘要: 卷积神经网络通常使用标准误差逆传播算法进行串行训练,随着数据规模的增长,单机串行训练存在耗时长且占有较多的系统资源的问题。为有效实现海量数据的卷积神经网络训练,提出一种基于MapReduce框架的BP神经网络并行化训练模型。该模型结合了标准误差逆传播算法和累积误差逆传播算法,将大数据集分割成若干个子集,在损失少量准确率的条件下进行并行化处理,并扩展MNIST数据集进行图像识别测试。实验结果表明,该算法对数据规模有较好的适应性,能够提高卷积神经网络的训练效率。

关键词: 卷积神经网络, 后向传播(BP)算法, Hadoop并行策略