神经网络非梯度优化方法研究进展

doi:10.3778/j.issn.1002-8331.2203-0195

摘要/Abstract

摘要： 神经网络优化是机器学习领域的一个基础性前沿课题。相较于神经网络的纯梯度优化算法，非梯度算法在解决收敛速度慢、易陷入局部最优、无法解决不可微等问题上表现出更大的优势。在剖析基于梯度的神经网络方法优缺点的基础上，重点对部分非梯度优化方法进行了综述，包括前馈神经网络优化和随机搜索优化；从基本理论、训练神经网络的步骤以及收敛性等方面对非梯度优化方法的优缺点和应用情况进行了分析；总结了基于非梯度的训练神经网络的算法在理论和应用方面面临的挑战并且展望了未来的发展方向。

关键词: 深度学习, 神经网络, 训练算法, 优化理论, 非梯度优化算法

Abstract: Neural network optimization is a basic frontier subject in the field of machine learning. Compared with the neural network optimization algorithm based on pure gradient, the non-gradient algorithm shows greater advantages in solving problems such as slow convergence speed, easy to fall into local optimum and inability to solve non-differentiable problems. On the basis of analyzing the advantages and disadvantages of gradient-based neural network methods, this paper firstly reviews the non-gradient optimization methods, including feedforward neural network optimization and stochastic search optimization. Then the advantages and disadvantages of the non-gradient optimization method and its application are analyzed from the aspects of basic theory, training steps of neural network and convergence. Finally, the theoretical and application challenges of training neural network algorithms based on non-gradient are summarized and the future development direction is forecasted.

Key words: deep learning, neural network, training algorithm, optimization theory, non-gradient optimization algorithm

盛蕾, 陈希亮, 康凯. 神经网络非梯度优化方法研究进展[J]. 计算机工程与应用, 2022, 58(17): 34-49.

SHENG Lei, CHEN Xiliang, KANG Kai. Research Progress of Neural Network Based on Non-Gradient Optimization Methods[J]. Computer Engineering and Applications, 2022, 58(17): 34-49.

参考文献

[1] BAO Y，TANG Z，LI H，et al.Computer vision and deep learning-based data anomaly detection method for structural health monitoring[J].Structural Health Monitoring，2019，18（2）：401-421.
[2] SALLAB A E，ABDOU M，PEROT E，et al.Deep reinforcement learning framework for autonomous driving[J].Electronic Imaging，2017，19：70-76.
[3] MIN S，LEE B，YOON S.Deep learning in bioinformatics[J].Briefings in Bioinformatics，2017，18（5）：851-869.
[4] RUMELHART D E，HINTON G E，WILLIAMS R J.Learning representations by back propagating errors[J].Nature，1986，323（6088）：533-536.
[5] DUCHI J，HAZAN E，SINGER Y.Adaptive subgradient methods for online learning and stochastic optimization[J].The Journal of Machine Learning Research，2011，12：2121-2159.
[6] KINGMA D，BA J.Adam：a method for stochastic optimization[J].arXiv：1412.6980，2014.
[7] SHALEV-SHWARTZ S，SHAMIR O，SHAMMAH S，et al.Failures of gradient-based deep learning[C]//International Conference on Machine Learning，2017：3067-3075.
[8] LOSHCHILOV I.LM-CMA：an alternative to L-BFGS for large scale black-box optimization[J].Evolutionary Computation，2017，25（1）：143-171.
[9] GOLOVIN D，KARRO J，KOCHANSKI G，et al.Gradientless descent：high-dimensional zeroth-order optimization[J].arXiv：1911.06317，2019.
[10] LIU S，LU S，CHEN X，et al.Min-max optimization without gradients：convergence and applications to black-box evasion and poisoning attacks[C]//International Conference on Machine Learning，2020：6282-6293.
[11] WEI L，ZHAO H，HE Z.Designing the topology of graph neural networks：a novel feature fusion perspective[J].arXiv：2112.14531，2021.
[12] HINZ T，NAVARRO-GUERRERO N，MAGG S，et al.Speeding up the hyperparameter optimization of deep convolutional neural networks[J].International Journal of Computational Intelligence and Applications，2018，17（2）：1850008.
[13] SUTSKEVER I，MARTENS J，DAHL G，et al.On the importance of initialization and momentum in deep learning[C]//International Conference on Machine Learning，Atlanta，Jun 16-21，2013：1139-1147.
[14] SHAN H，VIMIEIRO R B，BORGES L R，et al.Impact of loss functions on the performance of a deep neural network designed to restore low-dose digital mammography[J].arXiv：2111.06890，2021.
[15] MUSTAPHA A，MOHAMED L，ALI K.An overview of gradient descent algorithm optimization in machine learning：application in the ophthalmology field[C]//International Conference on Smart Applications and Data Analysis.Cham：Springer，2020：349-359.
[16] IAN G，YOSHUA B，AARON C.Deep learning：adaptive computation and machine learning series[M].Cambridge，Massachusetts，American：MIT Press，2016.
[17] WILAMOWSKI B M，YU H.Neural network learning without backpropagation[J].IEEE Transactions on Neural Networks，2010，21（11）：1793-1803.
[18] GORI M，TESI A.On the problem of local minima in backpropagation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，1992，14（1）：76-86.
[19] BRADY M L，RAGHAVAN R，SLAWNY J.Back-propagation fails to separate where perceptrons succeed[J].IEEE Transactions on Circuits and Systems，1989，36：665-674.
[20] BALDI P，HORNIK K.Neural networks and principal component analysis：learning from examples without local minima[J].Neural Networks，1989，2（1）：53-58.
[21] DAUPHIN Y，PASCANU R，GULCEHRE C，et al.Identifying and attacking the saddle point problem in high-dimensional non-convex optimization[C]//International Conference on Neural Information Processing Systems，2014.
[22] DEAN J，CORRADO G S，MONGA R，et al.Large scale distributed deep networks[C]//International Conference on Neural Information Processing Systems，2012：1223-1231.
[23] IOFFE S，SZEGEDY C.Batch normalization：accelerating deep network training by reducing internal covariate shift[C]//International Conference on Machine Learning，2015：448-456.
[24] SPECHT D F，SHAPIRO P D.Training speed comparison of probabilistic neural networks with back-propagation networks[C]//International Neural Network Conference，1990：440-443.
[25] SPECHT D F，SHAPIRO P D.Generalization accuracy of probabilistic neural networks compared with backpropagation networks[C]//IJCNN-91-Seattle International Joint Conference on Neural Networks，1991：887-892.
[26] FANG D，ZHANG T，WU F.An active-learning probabilistic neural network for feasibility classification of constrained engineering optimization problems[J].Engineering with Computers，2021：1-14.
[27] KARAMI F，KEHTARNAVAZ N，ROTEA M.Probabilistic neural network to quantify uncertainty of wind power estimation[J].arXiv：2106.04656，2021.
[28] WU X，SHI Y，MENG W，et al.Specific emitter identification for satellite communication using probabilistic neural networks[J].International Journal of Satellite Communications and Networking，2019，37（3）：283-291.
[29] SYAHPUTRA M F，RAHMAT R F，RAMBE R.Identification of lung cancer on chest X-Ray（CXR） medical images using the probabilistic neural network method[J].Journal of Physics：Conference Series，2021，1898（1）：012023.
[30] LIU J，LI L，FANG Y，et al.Research on arrhythmia classification method using optimized probabilistic neural network[J].Journal of Physics：Conference Series，2021，1939（1）：012104.
[31] GONG C，ZHOU X，NIU Y.Pattern recognition of epilepsy using parallel probabilistic neural network[J].Applied Intelligence，2021：1-12.
[32] 李君科，李明江，李德光.基于PNN的GIS局部放电模式识别方法[J].电气传动，2021，51（15）：45-52.
LI J K，LI M J，LI D G.GIS partial discharge pattern recognition based on PNN[J].Electric Drive，2021，51（15）：45-52.
[33] 孔慧芳，贾善坤，张晓雪.基于IPSO-PNN的电动汽车故障诊断[J].现代制造工程，2021（1）：130-135.
KONG H F，JIA S K，ZHANG X X.Electric vehicle fault diagnosis based on IPSO-PNN[J].Modern Manufacturing Engineering，2021（1）：130-135.
[34] ZHOU Y，YANG X，TAO L，et al.Transformer fault diagnosis model based on improved gray wolf optimizer and probabilistic neural network[J].Energies，2021，14：3029.
[35] MISHRA S，BHENDE C N，PANIGRAHI B K.Detection and classification of power quality disturbances using S-transform and probabilistic neural network[J].IEEE Transactions on Power Delivery，2008，23（1）：280-287.
[36] WANG J S，CHIANG W C，HSU Y L，et al.ECG arrhythmia classification using a probabilistic neural network with a feature reduction method[J].Neurocomputing，2013，116：38-45.
[37] SPECHT D F，ROMSDAHL H.Experience with adaptive probabilistic neural networks and adaptive general regression neural networks[C]//Proceedings of 1994 IEEE International Conference on Neural Networks（ICNN’94），1994：1203-1208.
[38] RAMAKRISHNAN S，SELVAN S.Image texture classification using wavelet based curve fitting and probabilistic neural network[J].International Journal of Imaging Systems and Technology，2007，17（4）：266-275.
[39] YI J H，WANG J，WANG G G.Improved probabilistic neural networks with self-adaptive strategies for transformer fault diagnosis problem[J].Advances in Mechanical Engineering，2016，8（1）：1687814015624832.
[40] ZHANG P，YIN Z Y，JIN Y F.Bayesian neural network-based uncertainty modelling：application to soil compressibility and undrained shear strength prediction[J].Canadian Geotechnical Journal，2022，99：1-12.
[41] QHWAN K，JOON-HYUK K，SUNGHOON K，et al.Bayesian neural network with pretrained protein embedding enhances prediction accuracy of drug-protein interaction[J].Bioinformatics，2021（20）：20.
[42] CHAN A，ALAA A，QIAN Z，et al.Unlabelled data improves bayesian uncertainty calibration under covariate shift[C]//International Conference on Machine Learning，2020：1392-1402.
[43] GAL Y，ISLAM R，GHAHRAMANI Z.Deep Bayesian active learning with image data[C]//International Conference on Machine Learning，2017：1183-1192.
[44] WATANABE K，TZAFESTAS S G.Learning algorithms for neural networks with the Kalman filters[J].Journal of Intelligent & Robotic Systems，1990，3（4）：305-319.
[45] PUSKORIUS G V，FELDKAMP L A.Parameter-based Kalman filter training：theory and implementation[M].[S.l.]：John Wiley & Sons，Inc，2002.
[46] HUBER M F.Bayesian perceptron：towards fully Bayesian neural networks[C]//IEEE Conference on Decision and Control，2020.
[47] WAGNER P，WU X，HUBER M F.Kalman Bayesian neural networks for closed-form online learning[J].arXiv：2110.00944，2021.
[48] HOMMELS A，MURAKAMI A，NISHIMURA S I.A comparison of the ensemble Kalman filter with the unscented Kalman filter：application to the construction of a road embankment[J].Geotechniek，2009，13（1）：52.
[49] KATZFUSS M，STROUD J R，WIKLE C K.Understanding the ensemble Kalman filter[J].The American Statistician，2016，70（4）：350-357.
[50] CHEN C，LIN X，HUANG Y，et al.Approximate Bayesian neural network trained with ensemble Kalman filter[C]//2019 International Joint Conference on Neural Networks（IJCNN），2019：1-8.
[51] HABER E，LUCKA F，RUTHOTTO L.Never look back-a modified EnKF method and its application to the training of neural networks without back propagation[J].arXiv：1805.08034，2018.
[52] KOVACHKI N B，STUART A M.Ensemble Kalman inversion：a derivative-free technique for machine learning tasks[J].Inverse Problems，2019，35（9）：095005.
[53] GUTH P A，SCHILLINGS C，WEISSMANN S.Ensemble Kalman filter for neural network-based one-shot inversion[J].arXiv：2005.02039，2020.
[54] CURSI F，YANG G Z.A novel approach for outlier detection and robust sensory data model learning[C]//2019 IEEE/RSJ International Conference on Intelligent Robots and Systems（IROS），2019：4250-4257.
[55] MITROS J，MAC NAMEE B.On the validity of Bayesian neural networks for uncertainty estimation[J].arXiv：1912.01530，2019.
[56] KRISTIADI A，HEIN M，HENNIG P.Being Bayesian，even just a bit，fixes overconfidence in relu networks[C]//International Conference on Machine Learning，2020：5436-5446.
[57] OVADIA Y，FERTIG E，REN J，et al.Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift[C]//Advances in Neural Information Processing Systems，2019.
[58] DEPEWEG S，HERNANDEZ-LOBATO J M，DOSHI-VELEZ F，et al.Decomposition of uncertainty in Bayesian deep learning for efficient and risk-sensitive learning[C]//International Conference on Machine Learning，2018：1184-1193.
[59] TRAN T，DO T T，REID I，et al.Bayesian generative active deep learning[C]//International Conference on Machine Learning，2019：6295-6304.
[60] GAL Y，ISLAM R，GHAHRAMANI Z.Deep Bayesian active learning with image data[C]//International Conference on Machine Learning，2017：1183-1192.
[61] RITTER H，BOTEV A，BARBER D.Online structured Laplace approximations for overcoming catastrophic forgetting[C]//Advances in Neural Information Processing Systems，2018.
[62] CHARNOCK T，PERREAULT-LEVASSEUR L，LANUSSE F.Bayesian neural networks[M]//Artificial intelligence for high energy physics.[S.l.]：World Scientific，2022：663-713.
[63] HUANG G B，ZHU Q Y，SIEW C K.Extreme learning machine：theory and applications[J].Neurocomputing，2006，70（1/2/3）：489-501.
[64] HUANG G B，ZHOU H M，DING X J，et al.Extreme learning machine for regression and multiclass classification[J].IEEE Transactions on Systems，Man，and Cybernetics，Part B：Cybernetics，2011，42（2）：513-529.
[65] XU Z，YAO M，WU Z，et al.Incremental regularized extreme learning machine and it’s enhancement[J].Neurocomputing，2016，174：134-142.
[66] NAYAK D R，DASH R，MAJHI B.Pathological brain detection using extreme learning machine trained with improved whale optimization algorithm[C]//2017 Ninth International Conference on Advances in Pattern Recognition（ICAPR），2017：1-6.
[67] WU T，YAO M，YANG J.Dolphin swarm extreme learning machine[J].Cognitive Computation，2017，9（2）：275-284.
[68] WANG M，CHEN H，LI H，et al.Grey wolf optimization evolving kernel extreme learning machine：application to bankruptcy prediction[J].Engineering Applications of Artificial Intelligence，2017，63：54-68.
[69] LIANG N Y，HUANG G B，SARATCHANDRAN P，et al.A fast and accurate online sequential learning algorithm for feedforward networks[J].IEEE Transactions on Neural Networks，2006，17（6）：1411-1423.
[70] ZHANG X,WANG H L.Selective forgetting extreme learning machine and its application to time series prediction[J].Acta Physica Sinica，2011，60（8）：1-7.
[71] ZONG W，HUANG G B，CHEN Y.Weighted extreme learning machine for imbalance learning[J].Neurocomputing，2013，101：229-242.
[72] HORATA P，CHIEWCHANWATTANA S，SUNAT K.Robust extreme learning machine[J].Neurocomputing，2013，102：31-44.
[73] KUDISTHALERT W，PASUPA K，MORALES A，et al.SELM：siamese extreme learning machine with application to face biometrics[J].arXiv：2108.03140，2021.
[74] ZHU D Y，TANG Z H，CHAI X Y，et al.NOx emission prediction model of coal-fired boiler based on extreme learning machine and error correction[C]//The Chinese Process Control Conference，Xuzhou，Jul 30-Aug 1，2020.
[75] QING C，YU W，CAI B，et al.ELM-based frame synchronization in burst-mode communication systems with nonlinear distortion[J].IEEE Wireless Communications Letters，2020，9（6）：915-919.
[76] ZHANG K，LUO M.Outlier-robust extreme learning machine for regression problems[J].Neurocomputing，2015，151：1519-1527.
[77] LEGORA S，INABA F K，SALLES E T，et al.Outlier robust extreme machine learning for multi-target regression[J].Expert Systems with Applications，2020，140：112877.
[78] LI Y，LIANG Y.Learning overparameterized neural networks via stochastic gradient descent on structured data[C]//Advances in Neural Information Processing Systems，2018.
[79] DU S S，ZHAI X，POCZOS B，et al.Gradient descent provably optimizes over-parameterized neural networks[J].arXiv：1810.02054，2018.
[80] FRANKLE J，SCHWAB D J，MORCOS A S.Training batchnorm and only batchnorm：on the expressive power of random features in cnns[J].arXiv：2003.00152，2020.
[81] RAMANUJAN V，WORTSMAN M，KEMBHAVI A，et al.What’s hidden in a randomly weighted neural network?[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：11893-11902.
[82] TRIPATHI R，SINGH B.RSO：a gradient free sampling based approach for training deep neural networks[J].arXiv：2005.05955，2020.
[83] EBERHART R，KENNEDY J.A new optimizer using particle swarm theory[C]//Sixth International Symposium on Micro Machine & Human Science，2002.
[84] SHI Y，EBERHART R.A modified particle swarm optimizer[C]//IEEE World Congress on Computational Intelligence，1998：69-73.
[85] CLERC M，KENNEDY J.The particle swarm-explosion，stability，and convergence in a multidimensional complex space[J].IEEE Transactions on Evolutionary Computation，2002，6（1）：58-73.
[86] INNOCENTE M S，SIENZ J.Particle swarm optimization with inertia weight and constriction factor[C]//International Conference on Swarm Intelligence（ICSI），2011.
[87] ADHIKARI R，AGRAWAL R K，KANT L.PSO based neural networks vs.traditional statistical models for seasonal time series forecasting[C]//Advance Computing Conference，2013.
[88] NADAI L，IMRE F，ARDABILI S，et al.Performance analysis of combine harvester using hybrid model of artificial neural networks particle swarm optimization[C]//2020 RIVF International Conference on Computing and Communication Technologies，2020.
[89] YANG J，LUO Z，ZHANG N，et al.Numerical calibration method for vehicle velocity data from electronic registration identification of motor vehicles based on mobile edge computing and particle swarm optimization neural network[J].Complexity，2020：2413564.
[90] TIAN L G，ZHANG P J，ZANG S.Application of an improved particle swarm optimization neural network model in the prediction of physical education in China[J].Chemical Engineering Transactions，2015，46：475-480.
[91] ROY P，MAHAPATRAT G S，DEY K N.An efficient particle swarm optimization-based neural network approach for software reliability assessment[J].International Journal of Reliability Quality and Safety Engineering，2017，24（4）：1-24.
[92] ROY P，MAHAPATRA G S，DEY K N.Forecasting of software reliability using neighborhood fuzzy particle swarm optimization based novel neural network[J].IEEE/CAA Journal of Automatica Sinica，2019，6（6）：1365-1383.
[93] SEHRISH M，DOHYEUN K.Prediction-learning algorithm for efficient energy consumption in smart buildings based on particle regeneration and velocity boost in particle swarm optimization neural networks[J].Energies，2018，11（5）：1289.
[94] NANDI A，JANA N D.Accuracy improvement of neural network training using particle swarm optimization and its stability analysis for classification[J].arXiv：1905. 04522，2019.
[95] YE Q，HAN Y，SUN Y，et al.PSO-PS：parameter synchronization with particle swarm optimization for distributed training of deep neural networks[C]//International Joint Conference on Neural Networks（IJCNN），Glasgow，Jul 19-24，2020.Piscataway：IEEE，2020：1-8.
[96] ZHANG C，SHAO H.An ANN’s evolved by a new evolutionary system and its application[C]//Proceedings of the 39th IEEE Conference on Decision and Control，2000：3562-3563.
[97] CARVALHO M，LUDERMIR T B.Particle swarm optimization of neural network architectures andweights[C]//7th International Conference on Hybrid Intelligent Systems（HIS 2007），2007：336-339.
[98] CANTU-PAZ E，KAMATH C.An empirical comparison of combinations of evolutionary algorithms and neural networks for classification problems[J].IEEE Transactions on Systems Man & Cybernetics，Part B：Cybernetics，2005，35（5）：915-927.
[99] EBERHART R C，SHI Y.Comparison between genetic algorithms and particle swarm optimization[C]//Proceedings of the 7th International Conference on Evolutionary Programming VII，1998：611-616.
[100] GARCíA NIETO P J，GARCíA-GONZALO E，BERNARDO SáNCHEZ A，et al.Air quality modeling using the PSO-SVM-based approach，MLP neural network，and M5 model tree in the metropolitan area of Oviedo（Northern Spain）[J].Environmental Modeling & Assessment，2018，23（3）：229-247.
[101] BAND S S，JANIZADEH S，PAL S C，et al.Novel ensemble approach of deep learning neural network（DLNN） model and particle swarm optimization（PSO） algorithm for prediction of gully erosion susceptibility[J].Sensors，2020，20（5609）：5609.
[102] JAIN N K，NANGIA U，JAIN J.A review of particle swarm optimization[J].Journal of the Institution of Engineers，2018，99（4）：1-5.
[103] MEISSNER M，SCHMUKER M，SCHNEIDER G.Optimized particle swarm optimization（OPSO） and its application to artificial neural network training[J].BMC Bioinformatics，2006，7（1）：125.
[104] DORIGO M，BIRATTARI M，STüTZLE T.Ant colony optimization：artificial ants as a computational intelligence technique[J].IEEE Computational Intelligence Magazine，2006，1（4）：28-39.
[105] JOSEPH MANOJ R，PRAVEENA A，VIJAYAKUMAR K.An ACO-ANN based feature selection algorithm for big data[J].Cluster Computing，2019，22（2）：3953-3960.
[106] ZHANG H，NGUYEN H，BUI X N，et al.Developing a novel artificial intelligence model to estimate the capital cost of mining projects using deep neural network-based ant colony optimization algorithm[J].Resources Policy，2020，66：101604.
[107] PING G，LIN Z.Ant colony optimization for continuous domains[C]//2012 8th International Conference on Natural Computation，2012.
[108] SOCHA K，DORIGO M.Ant colony optimization for continuous domains[J].European Journal of Operational Research，2008，185（3）：1155-1173.
[109] SOCHA K，BLUM C.An ant colony optimization algorithm for continuous optimization：application to feed-forward neural network training[J].Neural Computing & Applications，2007，16（3）：235-247.
[110] ZHAO Z，FENG J，JING K，et al.A hybrid ACOR algorithm for pattern classification neural network training[C]//2017 International Conference on Computing Intelligence and Information System（CIIS），2017：177-183.
[111] WAN F，WANG F Q，YUAN W L.The reservoir runoff forecast with artificial neural network based on ant colony optimization[J].Applied Ecology and Environmental Research，2017，15（4）：497-510.
[112] SUN Y，WANG S，SHEN Y，et al.Boosting ant colony optimization via solution prediction and machine lear-ning[J].Computers & Operations Research，2022：105769.
[113] LóPEZ-IBáEZ M，STüTZLE T，et al.An experimental analysis of design choices of multi-objective ant colony optimization algorithms[J].Swarm Intelligence，2012，6（3）：207-232.
[114] SHEN X，PLESTED J，GEDEON T.Feature selection on thermal-stress dataset[J].arXiv：2109.03755，2021.
[115] THEDE S M.An introduction to genetic algorithms[J].Journal of Computing Sciences in Colleges，2004：1-9.
[116] TOGELIUS J，LUCAS S，THANG H D，et al.The 2007 IEEE CEC simulated car racing competition[J].Genetic Programming & Evolvable Machines，2008，9（4）：295-329.
[117] DEB K，MYBURGH C.Breaking the billion-variable barrier in real-world optimization using a customized evolutionary algorithm[C]//Genetic & Evolutionary Computation Conference，2016：653-660.
[118] MOURET J B，CLUNE J.Illuminating search spaces by mapping elites[J].arXiv：1504.04909，2015.
[119] PUGH J K，SOROS L B，STANLEY K O.Quality diversity：a new frontier for evolutionary computation[J].Frontiers in Robotics & AI，2016，3：40.
[120] STANLEY K O.Compositional pattern producing networks：a novel abstraction of development[J].Genetic Programming and Evolvable Machines，2007，8（2）：131-162.
[121] GAUCI J.A hypercube-based indirect encoding for evolving large-scale neural networks[J].Artificial Life Journal，2009，15（2）：1-39.
[122] MILLER G.Designing neural networks using genetic algorithms[C]//the 3rd International Conference on Genetic Algorithms，1989.
[123] JADDI N S，ABDULLAH S，HAMDAN A R.A solution representation of genetic algorithm for neural network weights and structure[J].Information Processing Letters，2016，116（1）：22-25.
[124] ESFAHANIAN P，AKHAVAN M.GACNN：training deep convolutional neural networks with genetic algorithm[J].arXiv：1909.13354，2019.
[125] SUN Y，XUE B，ZHANG M，et al.Evolving deep convolutional neural networks for image classification[J].IEEE Transactions on Evolutionary Computation，2020，24（2）：394-407.
[126] FANG Y，LIU Y，SUN Y.Evolving deep neural networks for collaborative filtering[J].arXiv：2111.07758，2021.
[127] GHORBANZADEH G，NABIZADEH Z，KARIMI N，et al.DGAFF：deep genetic algorithm fitness formation for EEG bio-signal channel selection[J].arXiv：2202. 10034，2022.
[128] YANG D，YU Z，YUAN H，et al.An improved genetic algorithm and its application in neural network adversarial attack[J].arXiv：2110.01818，2021.
[129] SUCH F P，MADHAVAN V，CONTI E，et al.Deep neuroevolution：genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning[J].arXiv：1712.06567，2017.
[130] MCCULLOCH W S，PITTS W.A logical calculus of the ideas immanent in nervous activity[J].The Bulletin of Mathematical Biophysics，1943，5：115-133.
[131] YAN H，YU P，LONG D.Study on deep unsupervised learning optimization algorithm based on cloud computing[C]//2019 International Conference on Intelligent Transportation，Big Data & Smart City（ICITBS），2019.
[132] XU D，XIONG T，LIU D，et al.Multiple landmark detection in medical images based on hierarchical feature learning and end-to-end training：US10210613[P].2019.
[133] GOODFELLOW I，BENGIO Y，COURVILE A.Deep learning[M]//Adaptive computation and machine learning.Cambridge：MIT，2016：267-302.
[134] BLUM A，RIVEST R L.Training a 3-node neural network is NP-complete[C]//Proceedings of the 1st International Conference on Neural Information Processing Systems.Cambridge：MIT Press，1988：494-501.
[135] JUDD J S.Neural network design and the complexity of learning[M].Cambridge：MIT Press，1990.
[136] BAGIROV A M.Derivative-free methods for unconstrained nonsmooth optimization and its numerical analysis[J].Investigacao Operacional，1999，19：75-93.
[137] LIU Y，LIU B.Ancient ceramics classification method based on neural network optimized by improved ant colony algorithm[C]//International Conference on Computer Engineering and Networks，Xi’an，Oct 16-18，2020.Singapore：Springer，2020：276-282.
[138] NI W，XU Z，ZOU J，et al.Neural network optimal routing algorithm based on genetic ant colony in IPv6 environment[J].Computational Intelligence and Neuroscience，2021，2021（3）：1-13.
[139] CUI X D，ZHANG W，TüSKE Z，et al.Evolutionary stochastic gradient descent for optimization of deep neural networks[J].arXiv：1810.06773，2018.
[140] ZHANG S，CHEN R，DU W，et al.A hessian-free gradient flow（HFGF） method for the optimisation of deep learning neural networks[J].Computers & Chemical Engineering，2020，141：107008.
[141] 袁光耀.基于非线性滤波优化的前馈神经网络训练方法研究[D].开封：河南大学，2016.
YUAN G Y.Research on training method of feedforward neural network based on nonlinear filtering optimization[D].Kaifeng：Henan University，2016.
[142] TATSIS V A，PARSOPOULOS K E.Dynamic parameter adaptation in metaheuristics using gradient approximation and line search[J].Applied Soft Computing，2019，74：368-384.
[143] LI H，YANG Y，CHEN D，et al.Optimization algorithm inspired deep neural network structure design[C]//Asian Conference on Machine Learning，2018：614-629.
[144] GUO S，ZHU L，JIANG S，et al.Research on optimum algorithm of charging pile location for new energy electric vehicle[J].IOP Conference Series Materials Science and Engineering，2019，677：032087.
[145] VALAFAR H，ERSOY O K，VALAFAR F.Distributed global optimization（DGO）[J].arXiv：2012.09252，2020.
[146] ROUT S，DWIVEDI V，SRINIVASAN B.Numerical approximation in CFD problems using physics informed machine learning[J].arXiv：2111.02987，2021.