Computer Engineering and Applications ›› 2020, Vol. 56 ›› Issue (20): 118-123.DOI: 10.3778/j.issn.1002-8331.1906-0348

Previous Articles     Next Articles

Mongolian-Chinese Neural Machine Translation Method Based on Document-Level Context

GAO Fen, Su Yila, Ren Qing-Dao-Er-Ji   

  1. College of Information Engineering, Inner Mongolia University of Technology, Hohhot 010080, China
  • Online:2020-10-15 Published:2020-10-13

基于篇章上下文的蒙汉神经机器翻译方法

高芬,苏依拉,仁庆道尔吉   

  1. 内蒙古工业大学 信息工程学院,呼和浩特 010080

Abstract:

In this paper, Chinese characters uses sub-word features and Mongolian uses a hybrid encoder, which is used to express sentences more comprehensively and enhance the ability of the model. At the same time, the document-level context is applied to the Mongolian-Chinese neural machine translation to ease ambiguity. The experimental results show that, after applying the document-level context method to 67, 288 and 118, 502 parallel sentences of Mongolian and Chinese, the BLUE value increases by 0.9 and 0.5 respectively compared with the based system. Moreover, with the increase of document-level context corpus, the BLUE value can be improved more obviously. It shows that the document-level context method can improve the quality of translation.

Key words: sub-word, hybrid encoder, document-level context

摘要:

为了对句子有更全面的表示,增强翻译模型的能力,汉字利用子词特征,蒙古文端使用混合编码器。同时,为了缓解歧义问题,将篇章上下文方法应用于蒙汉神经机器翻译中。实验结果表明,在67 288句对蒙汉平行语料和118 502句对蒙汉平行语料中应用篇章上下文方法后,与基准系统相比,BLUE值分别提升了0.9和0.5。且随着篇章上下文语料的增大,BLUE值提升效果更明显。说明篇章上下文语境方法能够提高译文的质量。

关键词: 子词, 混合编码器器, 篇章上下文