计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (9): 83-90.DOI: 10.3778/j.issn.1002-8331.2104-0264

• 理论与研发 • 上一篇    下一篇

AdaSVRG:自适应学习率加速SVRG

吉梦,何清龙   

  1. 贵州大学 数学与统计学院,贵阳 550025
  • 出版日期:2022-05-01 发布日期:2022-05-01

AdaSVRG: Accelerating SVRG by Adaptive Learning Rate

JI Meng, HE Qinglong   

  1. School of Mathematics and Statistics, Guizhou University, Guiyang 550025, China
  • Online:2022-05-01 Published:2022-05-01

摘要: 在深度学习任务中,随机方差衰减梯度法通过降低随机梯度方差,因此,其具有较好的稳定性和较高的计算效率。然而,这类方法在学习过程中均使用恒定的学习率,降低了随机方差衰减梯度法的计算效率。基于随机方差衰减梯度法,借鉴动量加速思想并对梯度估计采取加权平均策略,对学习率利用历史梯度信息进行自动调整,提出了自适应随机方差衰减梯度法。基于MNIST和CIFAR-10数据集,验证提出的自适应随机方差衰减梯度法的有效性。实验结果表明,自适应随机方差衰减梯度法在收敛速度和稳定性方面优于随机方差衰减梯度法和随机梯度下降法。

关键词: 深度学习, 随机方差衰减梯度法, 自适应学习率, 动量法

Abstract: In deep learning tasks, stochastic variance-reduced gradient methods have better stability and higher computational efficiency by reducing variance of the stochastic gradients. However, a constant learning rate is employed in the learning process, which may reduce their computational efficiency of these variance-reduced methods. Based on the stochastic variance-reduced method, a new variance-reduced method is developed by utilizing the idea of momentum acceleration and the strategy of weighted average, referred to as adaptive stochastic variance-reduced gradient method (AdaSVRG). The effectiveness of the proposed AdaSVRG is verified based on MNIST and CIFAR-10 data sets. Experimental results show that AdaSVRG outperforms the stochastic variance-reduced gradient method and the stochastic gradient descent method in terms of convergence speed and stability.

Key words: deep learning, stochastic variance-reduced gradient method, adaptive learning rate, momentum