动量余弦相似度梯度优化图卷积神经网络

doi:10.3778/j.issn.1002-8331.2304-0250

摘要/Abstract

摘要： 传统梯度下降算法仅对历史梯度进行指数加权累加，没有利用梯度的局部变化，造成优化过程越过全局最优解，即使收敛到最优解也会在最优解附近震荡，其训练图卷积神经网络会造成收敛速度慢、测试准确度低。利用相邻两次梯度的余弦相似度，动态调整学习率，提出余弦相似度梯度下降（SimGrad）算法。为进一步提升图卷积神经网络训练的收敛速度和测试准确度，减少震荡，结合动量思想提出动量余弦相似度梯度下降（NSimGrad）算法。通过收敛性分析，证明SimGrad算法、NSimGrad算法都具有[OT]的遗憾界。在构建的三个非凸函数进行测试，并结合图卷积神经网络在四个数据集上进行实验，结果表明SimGrad算法保证了图卷积神经网络的收敛性，NSimGrad算法进一步提高图卷积神经网络训练的收敛速度和测试准确度，SimGrad、NSimGrad算法相较于Adam、Nadam具有更好的全局收敛性和优化能力。

关键词: 梯度下降类算法, 余弦相似度, 图卷积神经网络, 遗憾界, 全局收敛性

Abstract: The traditional gradient descent algorithm only uses the exponential weighted accumulation of historical gradients and does not take advantage of the local changes of gradients, which causes the optimization process to cross the global optimal solution. Even if it converges to the optimal solution, it will oscillate near the optimal solution. Using it to train graph convolutional neural network will result in slow convergence speed and low test accuracy. In this paper, the cosine similarity is used to dynamically adjust the learning rate and propose the cosine similarity gradient descent (SimGrad) algorithm. In order to further improve the convergence speed and test accuracy of the graph convolutional neural network training and reduce the oscillation, the momentum cosine similarity gradient descent (NSimGrad) algorithm is proposed combined with the momentum idea. The convergence analysis proves the regret bound of SimGrad algorithm and NSimGrad algorithm which is [OT]. Test on the three constructed non-convex functions and experiment on four datasets combined with the graph convolutional neural network. Experimental results show that SimGrad algorithm ensures the convergence of graph convolutional neural network, and NSimGrad algorithm further improves the convergence speed and test accuracy of graph convolutional neural network training. SimGrad and NSimGrad algorithms have better global convergence and optimization ability than Adam and Nadam.

Key words: gradient descent algorithm, cosine similarity, graph convolutional neural network, regret, global convergence

闫建红, 段运会. 动量余弦相似度梯度优化图卷积神经网络[J]. 计算机工程与应用, 2024, 60(14): 133-143.

YAN Jianhong, DUAN Yunhui. Graph Convolutional Neural Networks Optimized by Momentum Cosine Similarity Gradient[J]. Computer Engineering and Applications, 2024, 60(14): 133-143.

参考文献

[1] SCARSELLI F, GORI M, TSOI A C, et al. The graph neural work model[J]. IEEE Transactions on Networks, 2008, 20(1):61-80.
[2] BRUNA J, ZAREMBA W, SZLAM A, et al. Spectral networks and locally connected networks on graphs[J]. arXiv:1312. 6203, 2013.
[3] NITISH S K, RICHARD S. Improving generalization performance by switching from Adam to SGD[J]. arXiv:1712.07628, 2017.
[4] 刘菡, 王英男, 李新利, 等. 基于互信息-图卷积神经网络的燃煤电站NOx排放预测[J]. 中国电机工程学报, 2022, 42(3):1052-1059.
LIU H, WANG Y H, LI X L, et al. NOx emission prediction of coal-fired power stations based on the mutual information graph convolutional neural network[J]. Proceedings of the CSEE, 2022, 42(3):1052-1059.
[5] 富坤, 禚佳明, 郭云朋, 等. 自适应融合邻域聚合和邻域交互的图卷积网络[J]. 计算机科学与探索, 2023, 17(2): 453-466.
FU K, ZHOU J M, GUO Y P, et al. A graph convolutional network of adaptive fusion neighborhood aggregation and neighborhood interactions[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(2): 453-466.
[6] ROBBINS H, MONRO S. A stochastic approximation method[M]. New York: Springer, 1985.
[7] DOZAT T. Incorporating nesterov momentum into adam[C]//Proceedings of the 4th International Conference on Learning Representations, 2016.
[8] DUCHI J, HAZAN E, SINGER Y. Adaptive subgradient methods for online learning and stochastic optimization[J]. Journal of Machine Learning Research, 2011, 12: 2121-2159.
[9] ZEILER D M. ADADELTA: an adaptive learning ate method [J]. arXiv:1212.5701, 2012.
[10] HINTON G, SRIVASTAVA N, SWERSKY K. Neural networks for machine learning lecture 6a overview of mini-batch gradient descent[J]. Cited on, 2012, 14(8): 2.
[11] KINGMA D, BA J. Adam:a method for stochastic optimization[C]//Proceedings of the 3rd International Conference on Learning Representations, 2015: 1-13.
[12] REDDI S J, KALE S, KUMAR S. On the convergence of Adam and beyond[C]//Proceeding of the 6th International Conference on Learning Representations, 2018.
[13] NITISH S K, DHEEVATSA M, NOCEDAL J, et al. On large-batch training for deep learning: generalization gap and sharp minima[J]. arXiv:1609.04836, 2016.
[14] GHADIMI M, SHAHRIAR K, JALALIFAR H. Optimization of the fully grouted rock bolts for load transfer enhancement[J]. International Journal of Mining Science and Technology, 2015, 25(5): 707-712.
[15] DUBEY S R, CHAKRABORTY S, ROY S K, et al. diffGrad: an optimization method for convolutional neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 11(31): 4500-4511
[16] 谭涛. 基于卷积神经网络的随机梯度下降优化算法研究[D]. 重庆: 西南大学, 2020.
TAN T. The optimization algorithm research of stochastic gradient descent based on convolutional neural network[D]. Chongqing: Southwest University, 2020.
[17] 史加荣, 王丹, 尚凡华, 等. 随机梯度下降算法研究进展[J]. 自动化学报, 2021, 47(9):2103-2119.
SHI J R, WANG D, SHANG F H, et al. Progress in the stochastic gradient descent algorithm[J]. Acta Automatica Sinica, 2021, 47(9): 2103-2119.
[18] 杨启伦, 张续莹, 李含超, 等. 基于动量梯度下降的自适应干扰对消算法[J]. 电子信息对抗技术, 2022, 37(2): 30-32.
YANG Q L, ZHANG X Y, LI H C, et al. Adaptive interference-extinction algorithm based on momentum gradient descent[J]. Electronic Information Warfare Technology, 2022, 37(2):30-32.
[19] KIP F, WELLINNG M. Semi-supervised classification with graph convolutional networks[C]//Proceedings of the International Conference on Learning Representations, 2017.
[20] 李明, 来国红, 常晏鸣, 等. 深度学习算法中不同优化器的性能分析[J]. 信息技术与信息化, 2022: 206-209.
LI M, LAI G H, CHANG Y M, et al. Performance analysis of different optimizers in deep learning algorithms[J]. Information Technology and Informatization, 2022: 206-209.
[21] LANCTOT M, BRITTAIN J, FOERSTER J, et al. Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning[C]//Proceeding of the 2018 IEEE International Conference on Robotics and Automation, 2017: 3649-3658.
[22] BOTEV A, LEVER G, BARBER D. Nesterov’s accelerated gradient and momentum as approximations to regularised update descent[C]//Proceedings of the 30th International Joint Conference on Neural Networks, 2017: 1899-1903.