大数据下基于特征图的深度卷积神经网络

doi:10.3778/j.issn.1002-8331.2010-0081

摘要/Abstract

摘要： 针对大数据环境下DCNN（deep convolutional neural network）算法中存在网络冗余参数过多、参数寻优能力不佳和并行效率低的问题，提出了大数据环境下基于特征图和并行计算熵的深度卷积神经网络算法MR-FPDCNN（deep convolutional neural network algorithm based on feature graph and parallel computing entropy using MapReduce）。该算法设计了基于泰勒损失的特征图剪枝策略FMPTL（feature map pruning based on Taylor loss），预训练网络，获得压缩后的DCNN，有效减少了冗余参数，降低了DCNN训练的计算代价。提出了基于信息共享搜索策略ISS（information sharing strategy）的萤火虫优化算法IFAS（improved firefly algorithm based on ISS），根据“IFAS”算法初始化DCNN参数，实现DCNN的并行化训练，提高网络的寻优能力。在Reduce阶段提出了基于并行计算熵的动态负载均衡策略DLBPCE（dynamic load balancing strategy based on parallel computing entropy），获取全局训练结果，实现了数据的快速均匀分组，从而提高了集群的并行效率。实验结果表明，该算法不仅降低了DCNN在大数据环境下训练的计算代价，而且提高了并行系统的并行化性能。

关键词: DCNN算法, MapReduce框架, FMPTL策略, IFAS算法, DLBPCE策略

Abstract: Aiming at problems such as excessive network redundant parameters, poor parameter optimization ability and low parallel efficiency exist in DCNN（deep convolutional neural network） algorithm under big data environment. In this paper, a deep convolutional neural network algorithm based on feature graph and parallel computational entropy is proposed. The algorithm is MR-FPDCNN（deep convolutional neural network algorithm based on feature graph and parallel computing entropy using MapRuduce）. The algorithm designs the FMPTL（feature map pruning based on Taylor loss） and the pre-training network to obtain the compressed DCNN, which effectively reduces the redundant parameters and also reduces the computational cost of DCNN training. This paper proposes the IFAS based on ISS, initializes DCNN parameters according to the “IFAS” algorithm, realizes the parallelization training of DCNN, and improves the optimization ability of network. In the Reduce phase, a DLBPCE（dynamic load balancing strategy based on parallel computing entropy） is proposed to obtain global training results, realizing fast uniform grouping of data and increasing the acceleration ratio of the parallel system. Experimental results show that this algorithm not only reduces the computational cost of DCNN training in big data environment, but also improves the parallelization performance of parallel system.

Key words: deep convolutional neural network（DCNN） algorithm, MapReduce framework, feature map pruning based on Taylor loss（FMPTL） strategy, IFAS algorithm, dynamic load balancing strategy based on parallel computing entropy（DLBPCE） strategy

毛伊敏, 张瑞朋, 高波. 大数据下基于特征图的深度卷积神经网络[J]. 计算机工程与应用, 2022, 58(15): 110-116.

MAO Yimin, ZHANG Ruipeng, GAO Bo. Deep Convolutional Neural Network Algorithm Based on Feature Map in Big Data Environment[J]. Computer Engineering and Applications, 2022, 58(15): 110-116.

参考文献

[1] TIWARI S R，RANA K K.Feature selection in big data：trends and challenges[M]//Data science and intelligent applications.[S.l.]：Springer，2021.
[2] DEEPA N，PHAM Q V，NGUYEN D C，et al.A survey on blockchain for big data：approaches，opportunities，and future directions[J].arXiv：2009.00858v2，2020.
[3] ASHRAF R，HABIB M A，AKRAM M，et al.Deep convolution neural network for big data medical image classification[J].IEEE Access，2020，8：105659-105670.
[4] EGGENREICH S，PAYER C，URSCHLER M，et al.Variational inference and bayesian CNNs for uncertainty estimation in multi-factorial bone age prediction[J].arXiv：2002.10819v1，2020.
[5] SADEGHI M，NGUYEN P，HSU K，et al.Improving near real-time precipitation estimation using a U-Net convolutional neural network and geographical information[J].Environmental Modelling & Software，2020，134：104856.
[6] JIANG X，WANG C，FU Q.Development and application of deep convolutional neural network in target detection[C]//International Conference on Advances in Materials，Machinery，2018.
[7] CHAI D，NEWSAM S，HUANG J.Aerial image semantic segmentation using DCNN predicted distance maps[J].ISPRS Journal of Photogrammetry and Remote Sensing，2020，161：309-322.
[8] SEZER O B，GUDELEK U，OZBAYOGLU M.Financial time series forecasting with deep learning：a systematic literature review：2005-2019[J].Applied Soft Computing，2020，90：106181.
[9] VIEBKE A，MEMETI S，PLLANA S，et al.CHAOS：a parallelization scheme for training convolutional neural networks on Intel Xeon Phi[J].Journal of Supercomputing，2019，75（1）：197-227.
[10] ALOM M Z，TAHA T M，YAKOPCIC C，et al.The history began from AlexNet：a comprehensive survey on deep learning approaches[J].arXiv：1803.01164v2，2018.
[11] WEI J，IBRAHIM Y，QIAN S，et al.Analyzing the impact of soft errors in VGG networks implemented on GPUs[J].Microelectronics Reliability，2020，110：113648.
[12] ZHANG X，CHENG L，LI B，et al.Too far to see? Not really!Pedestrian detection with scale-aware localization policy[J].arXiv：1709.00235，2017.
[13] XIE H，CHEN Y，SHIN H.Context-aware pedestrian detection especially for small-sized instances with de-convolution integrated faster RCNN（DIF R-CNN）[J].Applied Intelligence，2019，49（3）：1200-1211.
[14] TUNG F，MORI G.Deep neural network compression by in-parallel pruning-quantization[J].IEEE Transactions on Pattern Analysis & Machine Intelligence，2018，42（3）：568-579.
[15] LIN S H.Toward compact convNets via structure-sparsity regularized filter pruning[J].IEEE Transactions on Neural Networks & Learning Systems，2019，90：106181.
[16] ANARAKI A K，AYATI M，KAZEMI F.Magnetic resonance imaging-based brain tumor grades classification and grading via convolutional neural networks and genetic algorithms[J].Biocybernetics and Biomedical Engineering，2018，39.
[17] BANHARNSAKUN A.Towards improving the convolutional neural networks for deep learning using the distributed artificial bee colony method[J].International Journal of Machine Learning and Cybernetics，2019，10（6）：1301-1311.
[18] GILBERT A，HOLDEN M，EIKVIL L，et al.Doppler spectrum classification with CNNs via heat-map location encoding and a multi-head output layer[J].arXiv：1911. 02407，2019.
[19] 宋杰，孙宗哲，毛克明，等.MapReduce大数据处理平台与算法研究进展[J].软件学报，2017，28（3）：514-543.
SONG J，SUN Z Z，MAO K M，et al.Research Advance on MapReduce based on big data processing platform and algorithm[J].Journal of Software，2017，28（3）：514-543.
[20] 张任其，李建华，范磊.分布式环境下卷积神经网络并行策略研究[J].计算机工程与应用，2017，53（8）：1-7.
ZHANG R Q，LI J H，FAN L，et al.Research on parallel strategy of convolutional neural network in distributed environment[J].Computer Engineering and Applications，2017，53（8）：1-7.
[21] BASIT N，ZHANG Y T，WU H，et al.MapReduce-based deep learning with handwritten digit recognition case study[C]//2016 IEEE International Conference on Big Data，2016.
[22] ZENG K，DING S F，JIA W.Single image super-resolution using a polymorphic parallel CNN[J].Applied Intelligence，2019（1/9）：37.
[23] BANHARNSAKUN A.Towards improving the convolutional neural networks for deep learning using the distributed artificial bee colony method[J].International Journal of Machine Learning and Cybernetics，2019，10（6）：1301-1311.
[24] KAIPA K，GHOSE D.Glowworm swarm optimization：theory，algorithms，and applications[M].[S.l.]：Springer International Publishing，2017.
[25] YANG Y，LIU J S，ZHOU Y Q.Artificial glowworm swarm optimization algorithm for solving double numerical integration problems[J].Software Guide，2018，17（7）：116-119.
[26] ZHENG X Q，GUI Z H，WANG Y.Support vector machine model based on glowworm swarm optimization in application of vibrant fault diagnosis for hydro-turbine generating unit[C]//Proceedings of 2017 IEEE 3rd Information Technology and Mechatronics Engineering Conference，2017.
[27] CAO J，LI Z，YANG P，et al.Parallel permutation entropy algorithm based on MapReduce in cloud computing environment[J].Electric Power Information and Communication Technology，2019，17（1）：1-6.