计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (7): 1-12.DOI: 10.3778/j.issn.1002-8331.2307-0370

• 热点与综述 • 上一篇    下一篇

深度学习优化器进展综述

常禧龙,梁琨,李文涛   

  1. 天津科技大学 人工智能学院,天津 300457
  • 出版日期:2024-04-01 发布日期:2024-04-01

Review of Development of Deep Learning Optimizer

CHANG Xilong, LIANG Kun, LI Wentao   

  1. College of Artificial Intelligence, Tianjin University of Science and Technology, Tianjin 300457, China
  • Online:2024-04-01 Published:2024-04-01

摘要: 优化器是提高深度学习模型性能的关键因素,通过最小化损失函数使得模型的参数和真实参数接近从而提高模型的性能。随着GPT等大语言模型成为自然语言处理领域研究焦点,以梯度下降优化器为核心的传统优化器对大模型的优化效果甚微。因此自适应矩估计类优化器应运而生,其在提高模型泛化能力等方面显著优于传统优化器。以梯度下降、自适应梯度和自适应矩估计三类优化器为主线,分析其原理及优劣。将优化器应用到Transformer架构中,选取法-英翻译任务作为评估基准,通过实验深入探讨优化器在特定任务上的效果差异。实验结果表明,自适应矩估计类优化器在机器翻译任务上有效提高模型的性能。同时,展望优化器的发展方向并给出在具体任务上的应用场景。

关键词: 优化器, 机器翻译, Transformer, 深度学习, 学习率预热算法

Abstract: Optimization algorithms are the most critical  factor in improving the performance of deep learning models, achieved by minimizing the loss function. Large language models (LLMs), such as GPT, have become the research focus in the field of natural language processing, the optimization effect of traditional gradient descent algorithm has been limited. Therefore, adaptive moment estimation algorithms have emerged, which are significantly superior to traditional optimization algorithms in generalization ability. Based on gradient descent, adaptive gradient, and adaptive moment estimation algorithms, and the pros  and cons of optimization algorithms are analyzed. This paper applies optimization algorithms to the Transformer architecture and selects the French-English translation task as the evaluation benchmark. Experiments have shown that adaptive moment estimation algorithms can effectively improve the performance of the model in machine translation tasks. Meanwhile, it discusses the development direction and applications of optimization algorithms.

Key words: optimizer, machine translation, Transformer, deep learning, learning rate warm-up algorithm