Computer Engineering and Applications ›› 2018, Vol. 54 ›› Issue (2): 137-143.DOI: 10.3778/j.issn.1002-8331.1608-0069

Previous Articles     Next Articles

Training BP neural networks with MapReduce based on sample data slice disruptions

CHEN Wanghu, YU Maoyi, MA Shengjun   

  1. College of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China
  • Online:2018-01-15 Published:2018-01-31

基于输入分片扰乱的BP神经网络MapReduce训练方法

陈旺虎,俞茂义,马生俊   

  1. 西北师范大学 计算机科学与工程学院,兰州 730070

Abstract: During the training of a BP neural network with MapReduce, its convergence with the current intermediate weight matrix is just got by sample data slices on the specific map task node. Therefore, the converge of the BP network to the whole training sample set is hard to be fulfilled. The approach to training BP networks with MapReduce based on sample slice disruptions is proposed. Based on systematic sampling to the whole training sample data, new input slice can be produced for each training map task. Such sample slices are used for the specific map tasks as new input during future training. This can accelerate the process of convergence of the BP network. Moreover, in order to speed up the local convergence of the map training tasks, the intermediate matrix with minimum global error is taken as the initial weight matrix during the future training. The experimental results on Hadoop clusters show that the approach can improve the efficiency of BP neural network training with MapReduce.

Key words: neural network, MapReduce, sample slices, convergence

摘要: BP神经网络的MapReduce训练中,每个map训练任务产生的中间权阵只对该训练节点上的输入分片收敛,为提高BP神经网络的训练效率,保证MapReduce训练的全局收敛性,提出一种基于输入分片扰乱的MapReduce训练方法。通过对训练样本集进行系统抽样来扰乱输入分片,并产生新的输入分片,依靠新的输入分片以map任务的原权阵为基础进行迭代训练,可加速MapReduce训练达到收敛的进程;为提高map训练任务的局部收敛速度,在每轮次的训练完成后,选取map任务产生的权阵中全局误差最小者,作为下轮次各map训练任务的初始权阵。在Hadoop集群上的实验表明,该方法可使MapReduce训练BP神经网络的效率得到很大提升。

关键词: 神经网络, MapReduce, 输入分片, 收敛