自然语言处理预训练模型的研究综述

doi:10.3778/j.issn.1002-8331.2006-0040

计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (23): 12-22.DOI: 10.3778/j.issn.1002-8331.2006-0040

自然语言处理预训练模型的研究综述

余同瑞，金冉，韩晓臻，李家辉，郁婷

1.浙江万里学院大数据与软件工程学院，浙江宁波 315100
2.浙江大学计算机科学与技术学院，杭州 310027

出版日期:2020-12-01 发布日期:2020-11-30

Review of Pre-training Models for Natural Language Processing

YU Tongrui, JIN Ran, HAN Xiaozhen, LI Jiahui, YU Ting

1.College of Big Data and Software Engineering, Zhejiang Wanli University, Ningbo, Zhejiang 315100, China
2.College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China

Online:2020-12-01 Published:2020-11-30

摘要/Abstract

摘要：

近年来，深度学习技术被广泛应用于各个领域，基于深度学习的预处理模型将自然语言处理带入一个新时代。预训练模型的目标是如何使预训练好的模型处于良好的初始状态，在下游任务中达到更好的性能表现。对预训练技术及其发展历史进行介绍，并按照模型特点划分为基于概率统计的传统模型和基于深度学习的新式模型进行综述；简要分析传统预训练模型的特点及局限性，重点介绍基于深度学习的预训练模型，并针对它们在下游任务的表现进行对比评估；梳理出具有启发意义的新式预训练模型，简述这些模型的改进机制以及在下游任务中取得的性能提升；总结目前预训练的模型所面临的问题，并对后续发展趋势进行展望。

关键词: 深度学习, 自然语言处理, 预处理, 词向量, 语言模型

Abstract:

In recent years, deep learning technology has been advancing, pre-training technology for deep learning brings natural language processing into a new era. Pre-training model aims to how to make pre-trained model stay in good initial state and achieve better performances in subsequent downstream tasks. This paper firstly introduces pre-training technology and its development history. And then, this paper further classifies it into the following two types, namely probability-statistics-based traditional model and deep-learning-based new model, according to different features of pre-training models to conduct corresponding detailed introductions. Firstly, it briefly analyzes the characteristics and limitations of today’s pre-training models and highlights the existing deep-learning-based pre-training models. And based on their performances in downstream tasks, it gives necessary comparisons and assessments accordingly. Finally, it combs out a series of whole-new pre-training models with instructive significances,briefly describes corresponding feasible improvement mechanisms and the performance enhancements achieved in downstream tasks, summarizes the problems existing therein, as well as prospectes its development trend in near future.

Key words: deep learning, natural language processing, pre-training, word embedding, language model

余同瑞，金冉，韩晓臻，李家辉，郁婷. 自然语言处理预训练模型的研究综述[J]. 计算机工程与应用, 2020, 56(23): 12-22.

YU Tongrui, JIN Ran, HAN Xiaozhen, LI Jiahui, YU Ting. Review of Pre-training Models for Natural Language Processing[J]. Computer Engineering and Applications, 2020, 56(23): 12-22.

[1]	武文杰，宋文爱，高雪梅，杨吉江，王青，黄丽萍，雷毅. 基于X线的成人OSA计算机辅助诊断综述[J]. 计算机工程与应用, 2021, 57(9): 1-8.
[2]	冉蓉，徐兴华，邱少华，崔小鹏，欧阳斌. 基于深度卷积神经网络的裂纹检测方法综述[J]. 计算机工程与应用, 2021, 57(9): 23-35.
[3]	李晓筱，胡晓光，王梓强，杜卓群. 基于深度学习的实例分割研究进展[J]. 计算机工程与应用, 2021, 57(9): 60-67.
[4]	黄冬宜，杨兵，吴子豪，匡佳一，颜泽明. 用于全市蜂窝流量预测的时空全连接卷积网络[J]. 计算机工程与应用, 2021, 57(9): 168-175.
[5]	周伦钢，孙怡峰，王坤，吴疆，黄维贵，李炳龙. 目标多种多值属性的端端快速识别网络[J]. 计算机工程与应用, 2021, 57(9): 182-190.
[6]	张成，戴俊峰，熊闻心. 融合LeNet-5改进的扫描文档手写日期识别[J]. 计算机工程与应用, 2021, 57(9): 207-211.
[7]	曾春艳，严康，王志锋，余琰，纪纯妹. 深度学习模型可解释性研究综述[J]. 计算机工程与应用, 2021, 57(8): 1-9.
[8]	许德刚，王露，李凡. 深度学习的典型目标检测算法研究综述[J]. 计算机工程与应用, 2021, 57(8): 10-25.
[9]	蒋斌，钟瑞，张秋闻，张焕龙. 采用深度学习方法的非正面表情识别综述[J]. 计算机工程与应用, 2021, 57(8): 48-61.
[10]	赵圆丽，梁志剑. 基于异核卷积双注意机制的立场检测研究[J]. 计算机工程与应用, 2021, 57(8): 119-125.
[11]	李明山，韩清鹏，张天宇，王道累. 改进SSD的安全帽检测方法[J]. 计算机工程与应用, 2021, 57(8): 192-197.
[12]	李健，孙大松，张备伟. 结合双编码器与对抗训练的图像修复[J]. 计算机工程与应用, 2021, 57(7): 192-197.
[13]	杨波，陶青川，董沛君. 改进Deeplab v3+网络的手术器械分割方法[J]. 计算机工程与应用, 2021, 57(7): 222-227.
[14]	刘迪，贾金露，赵玉卿，钱育蓉. 基于深度学习的图像去噪方法研究综述[J]. 计算机工程与应用, 2021, 57(7): 1-13.
[15]	杨培伟，周余红，邢岗，田智强，许夏瑜. 卷积神经网络在生物医学图像上的应用进展[J]. 计算机工程与应用, 2021, 57(7): 44-58.

自然语言处理预训练模型的研究综述

Review of Pre-training Models for Natural Language Processing

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics