计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (22): 117-123.DOI: 10.3778/j.issn.1002-8331.2003-0261

• 模式识别与人工智能 • 上一篇    下一篇

批量减数更新方差缩减梯度下降算法BSUG

宋杰,朱勇,许冰   

  1. 安徽大学 计算机科学与技术学院,合肥 230601
  • 出版日期:2020-11-15 发布日期:2020-11-13

Batch Subtraction Update Variance Reduction Gradient Descent Algorithm BSUG

SONG Jie, ZHU Yong, XU Bing   

  1. School of Computer Science and Technology, Anhui University, Hefei 230601, China
  • Online:2020-11-15 Published:2020-11-13

摘要:

机器学习问题通常会转换成求解一个目标函数问题。继随机梯度下降(Stochastic Gradient Descent,SGD)之后,随机方差缩减梯度法(Stochastic Variance Reduction Gradient,SVRG)成为如今优化目标函数参数的主流算法,它由于不受方差影响达到线性收敛而被人们广泛研究。它的提出导致陆续出现如SAGA(Stochastic Average Gradient Average)和SCSG(Stochastically Controlled Stochastic Gradient)等新型方差缩减算法,它们有着过量消耗内存、迭代缓慢等问题。为了实现小成本存储以及快速迭代的目的,设计了一种以SVRG为基础的新型变异方差缩减算法BSUG(Batch Subtraction Update Gradient)。改进在于:使用小批量样本代替全部样本进行平均梯度计算,同时对平均梯度进行减数更新。每轮迭代中,随机抽取一批小样本进行平均梯度计算,同时在内部迭代时通过对过去模型梯度的舍去来达到更新平均梯度的目的。通过合适地降低批大小[B],可以减少内存存储以及迭代次数。理论分析算法的收敛性,并基于Python进行算法实现,通过与Mini-Batch SGD、AdaGrad、RMSProp、SVRG和SCSG等算法进行比较证明了BSUG算法的有效性,并且通过对超参数进行探究证明了算法的稳定性。

关键词: 机器学习, 优化, 小批量, 减数更新, 随机方差缩减梯度法(SVRG)

Abstract:

Machine learning problems are often converted to solving an object function problem. After SGD(Stochastic Gradient Descent), SVRG(Stochastic Variance Reduction Gradient) has become the mainstream algorithm to optimize the parameters of the objective function. Because it can achieve linear convergence without the influence of variance, it has been widely studied. It leads to the emergence of new variance reduction algorithms such as SAGA(Stochastic Average Gradient Average) and SCSG(Stochastically Controlled Stochastic Gradient). They have problems with excessive memory consumption and slow iteration. In order to realize the purpose of low cost storage and fast iteration, a novel variance reduction algorithm BSUG(Batch Subtraction Update Gradient) based on SVRG is designed. The improvement is that the average gradient is calculated by using small batch samples instead of all samples, and the average gradient is updated by subtraction. In each iteration, a batch of small samples are randomly selected to calculate the average gradient, and the average gradient is updated by discarding the previous model gradient during the internal iteration. Memory storage and iteration times can be reduced by appropriately reducing batch size [B]. The convergence of the algorithm is analyzed theoretically and implemented based on Python. The effectiveness of the BSUG algorithm is proved by making comparisons with the algorithms of Mini-Batch SGD, AdaGrad, RMSProp, SVRG and SCSG, and the stability of the algorithm is proved by exploring the superparameters.

Key words: machine learning, optimization, mini batch, reduction update, Stochastic Variance Reduction Gradient(SVRG)