Computer Engineering and Applications ›› 2024, Vol. 60 ›› Issue (4): 57-74.DOI: 10.3778/j.issn.1002-8331.2305-0102

• Research Hotspots and Reviews • Previous Articles     Next Articles

Survey of Neural Machine Translation

ZHANG Junjin, TIAN Yonghong, SONG Zheyu, HAO Yufeng   

  1. College of Data Science and Application, Inner Mongolia University of Technology, Hohhot 010080, China
  • Online:2024-02-15 Published:2024-02-15

神经机器翻译综述

章钧津,田永红,宋哲煜,郝宇峰   

  1. 内蒙古工业大学  数据科学与应用学院,呼和浩特  010080

Abstract: Machine translation (MT) mainly studies how to translate the source language into the target language, which is of great significance for promoting the communication between nationalities. At present, neural machine translation (NMT) has become the mainstream MT method by translation speed and quality. In order to better sort out the context, this paper first introduces the history and methods of MT, compares and summarizes three main methods: rule-based machine translation, statistics-based machine translation and deep learning-based machine translation. Then NMT is introduced to explain its common types. Next, six main research fields of NMT are introduced, including multimodal MT, non-autoregressive MT, document-level MT, multilingual MT, data augmentation technology and preprocessing technique. Finally, the future of NMT is prospected from four aspects: low-resource languages, context-sensitive translation, unknown words and large models. This paper provides a systematic introduction to better understand the development status of NMT.

Key words: machine translation, neural machine translation, document-level machine translation, data augmentation, preprocessing technique

摘要: 机器翻译主要研究如何将源语言翻译为目标语言,对于促进民族之间的交流具有重要意义。目前神经机器翻译凭借翻译速度和译文质量成为主流的机器翻译方法。为更好地进行脉络梳理,首先对机器翻译的历史和方法进行研究,并对基于规则的机器翻译、基于统计的机器翻译和基于深度学习的机器翻译三种方法进行对比总结;然后引出神经机器翻译,并对其常见的类型进行讲解;接着选取多模态机器翻译、非自回归机器翻译、篇章级机器翻译、多语言机器翻译、数据增强技术和预训练模型六个主要的神经机器翻译研究领域进行重点介绍;最后从低资源语言、上下文相关翻译、未登录词和大模型四个方面对神经机器翻译的未来进行了展望。通过系统性的介绍以更好地理解神经机器翻译的发展现状。

关键词: 机器翻译, 神经机器翻译, 篇章级机器翻译, 数据增强, 预处理技术