带标记音节的双向维汉神经机器翻译方法

doi:10.3778/j.issn.1002-8331.1912-0118

计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (4): 161-168.DOI: 10.3778/j.issn.1002-8331.1912-0118

带标记音节的双向维汉神经机器翻译方法

艾山·吾买尔，斯拉吉艾合麦提·如则麦麦提，西热艾力·海热拉，刘文其，吐尔根·依布拉音，汪烈军，瓦依提·阿不力孜

1.新疆大学信息科学与工程学院，乌鲁木齐 830046
2.新疆大学新疆多语种信息技术实验室，乌鲁木齐 830046
3.新疆大学软件学院，乌鲁木齐 830091

出版日期:2021-02-15 发布日期:2021-02-06

Bi-directional Uyghur-Chinese Neural Machine Translation with Marked Syllables

Hasan Wumaier, Sirajahmat Ruzmamat, Xireaili Hairela, LIU Wenqi, Tuergen Yibulayin, WANG Liejun, Wayit Abulizi

1.College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
2.Xinjiang Laboratory of Multi-language Information Technology, Xinjiang University, Urumqi 830046, China
3.School of Software, Xinjiang University, Urumqi 830091, China

Online:2021-02-15 Published:2021-02-06

摘要/Abstract

摘要：

近年来，基于神经网络的机器翻译成为机器翻译领域的主流方法，但是在低资源翻译领域中仍存在平行语料不足和数据稀疏的挑战。针对维-汉平行语料不足和维吾尔语形态复杂所导致的数据稀疏问题，从维吾尔语的音节特点出发，将单词切分成音节，同时融入BME（Begin，Middle，End）标记思想，提出一种基于带标记音节的神经网络机器翻译方法。与使用单词粒度和BPE粒度的两类神经网络机器翻译方法对比，该方法在维-汉机器翻译任务中分别提升7.39与3.04个BLEU值，在汉-维机器翻译任务中分别提升5.82与3.09个BLEU值，可见在平行语料不足的条件下，该方法有效地提升了维-汉机器翻译的质量。

关键词: 神经机器翻译, 数据稀疏, 音节粒度, 维汉神经机器翻译

Abstract:

In recent years, neural networks have become the mainstream methods used in machine translation, but in the field of low-resource machine translation, parallel corpus shortage and data sparseness remain great challenges. Aiming at the problem of data sparseness caused by insufficient Uyghur-Chinese parallel corpus and complex Uyghur morphology, this paper proposes a neural network method, which is based on the syllable characteristics of Uyghur language, cutting words into syllables, and incorporating the idea of BME（Begin, Middle, End） markup. Compared to the word level and the BPE level, the proposed method improves 7.39 and 3.04 BLEU values respectively in Uyghur-Chinese machine translation tasks, and 5.82 and 3.09 BLEU values respectively in Chinese-Uyghur machine translation. It indicates that under the condition of insufficient parallel corpus, this method can effectively improve the quality of Uyghur-Chinese machine translation.

Key words: neural machine translation, sparse data, syllable level, Uyghur-Chinese neural machine translation

艾山·吾买尔，斯拉吉艾合麦提·如则麦麦提，西热艾力·海热拉，刘文其，吐尔根·依布拉音，汪烈军，瓦依提·阿不力孜. 带标记音节的双向维汉神经机器翻译方法[J]. 计算机工程与应用, 2021, 57(4): 161-168.

Hasan Wumaier, Sirajahmat Ruzmamat, Xireaili Hairela, LIU Wenqi, Tuergen Yibulayin, WANG Liejun, Wayit Abulizi. Bi-directional Uyghur-Chinese Neural Machine Translation with Marked Syllables[J]. Computer Engineering and Applications, 2021, 57(4): 161-168.

[1]	胡学林，艾山·吾买尔. 基于分布式表示技术的推荐算法综述[J]. 计算机工程与应用, 2020, 56(22): 13-24.
[2]	李淑芝，余乐陶，邓小鸿，李志军. 结合Skip-gram和加权损失函数的神经网络推荐模型[J]. 计算机工程与应用, 2020, 56(19): 76-85.
[3]	吴彦文1，李斌1，孙晨辉1，杜嘉薇1，王馨悦2. 基于迁移学习的领域自适应推荐方法研究[J]. 计算机工程与应用, 2019, 55(13): 59-65.
[4]	吴鸿玲1，程耕国2. 融合用户动态标签和信任关系的协同过滤算法[J]. 计算机工程与应用, 2018, 54(19): 43-48.
[5]	姜维1，庞秀丽2，1. 面向数据稀疏问题的个性化组合推荐研究[J]. 计算机工程与应用, 2012, 48(21): 21-25.
[6]	史瑞芳. 贝叶斯文本分类器的研究与改进[J]. 计算机工程与应用, 2009, 45(12): 147-148.

带标记音节的双向维汉神经机器翻译方法

Bi-directional Uyghur-Chinese Neural Machine Translation with Marked Syllables

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 6

编辑推荐

Metrics