Bi-directional Uyghur-Chinese Neural Machine Translation with Marked Syllables
Hasan Wumaier, Sirajahmat Ruzmamat, Xireaili Hairela, LIU Wenqi, Tuergen Yibulayin, WANG Liejun, Wayit Abulizi
1.College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
2.Xinjiang Laboratory of Multi-language Information Technology, Xinjiang University, Urumqi 830046, China
3.School of Software, Xinjiang University, Urumqi 830091, China
In recent years, neural networks have become the mainstream methods used in machine translation, but in the field of low-resource machine translation, parallel corpus shortage and data sparseness remain great challenges. Aiming at the problem of data sparseness caused by insufficient Uyghur-Chinese parallel corpus and complex Uyghur morphology, this paper proposes a neural network method, which is based on the syllable characteristics of Uyghur language, cutting words into syllables, and incorporating the idea of BME(Begin, Middle, End) markup. Compared to the word level and the BPE level, the proposed method improves 7.39 and 3.04 BLEU values respectively in Uyghur-Chinese machine translation tasks, and 5.82 and 3.09 BLEU values respectively in Chinese-Uyghur machine translation. It indicates that under the condition of insufficient parallel corpus, this method can effectively improve the quality of Uyghur-Chinese machine translation.