N-Best句法知识增强的统计机器翻译预调序模型

计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (17): 160-165.

N-Best句法知识增强的统计机器翻译预调序模型

郭俊博1，张喜媛2，杜金华2

1.西安理工大学高等技术学院，西安 710048
2.西安理工大学自动化与信息工程学院，西安 710048

出版日期:2016-09-01 发布日期:2016-09-14

N-best syntactic knowledge enhanced pre-reordering model for statistical machine translation

GUO Junbo1, ZHANG Xiyuan2, DU Jinhua2

1.Faculty of Higher Vocational and Technical Education, Xi’an University of Technology, Xi’an 710048, China
2.Faculty of Automation and Information Engineering, Xi’an University of Technology, Xi’an 710048, China

Online:2016-09-01 Published:2016-09-14

摘要/Abstract

摘要： 源语言和目标语言的句法异构性对统计机器翻译（SMT）性能有重要影响。在基于短语的汉英统计机器翻译基础上，提出了一种基于N-best句法知识增强的源语言预调序方法。首先对源语言输入句子进行N-best句法分析，计算统计概率得到高可靠性子树结构，再根据词对齐信息从可靠性子树结构中抽取初始调序规则集。两种优化策略用于对初始规则集进行优化：基于中英文句法知识规则推导筛选和规则概率阈值控制机制。然后为减少短语内部调序，保证短语局部流利性，采用源语言短语翻译表为约束，使调序控制在短语块之间进行。最后根据获取的优化规则集和短语表约束条件对源语言端句子的句法分析树进行预调序。在基于NIST 2005和2008测试数据集上的汉英统计机器翻译实验结果表明，所提基于N-best句法知识增强的统计机器翻译预调序方法相对于基线系统，自动评价准则BLEU得分分别提高了0.68和0.83。

关键词: 统计机器翻译, 预调序模型, N-best句法树, 调序规则, 规则优化

Abstract: The syntactic heterogeneity between source and target languages has a significant impact on Statistical Machine Translation（SMT） performance. Based on the Chinese-English SMT system, an N-best syntactic knowledge enhanced method is proposed to pre-order the source-side sentences. Firstly, syntactic N-best parsed trees are generated, and highly reliable sub-trees are obtained by computing their posterior probabilities and then initial reordering rule set is extracted according to the word alignment links and sub-trees. Two optimization strategies are utilized to process the initial rule set, namely the bilingually syntactic knowledge-based and probability threshold-based. Secondly, in order to guarantee the local fluency of phrases, the phrase table is used to constrain the reordering only taking place between phrases rather than inside phrases. Finally, the optimized reordering rule set constrained by the phrase table is utilized to perform pre-reordering in source-side sentences. Experimental results on NIST 2005 and 2008 test sets show that the BLEU score improves 0.68 and 0.83 respectively compared to the baseline system.

Key words: statistical machine translation, pre-reordering model, N-best parsed tree, reordering rules, rule optimization

郭俊博1，张喜媛2，杜金华2. N-Best句法知识增强的统计机器翻译预调序模型[J]. 计算机工程与应用, 2016, 52(17): 160-165.

GUO Junbo1, ZHANG Xiyuan2, DU Jinhua2. N-best syntactic knowledge enhanced pre-reordering model for statistical machine translation[J]. Computer Engineering and Applications, 2016, 52(17): 160-165.

[1]	帕丽旦·木合塔尔，吾守尔·斯拉木，买买提阿依甫，努尔麦麦提·尤鲁瓦斯. RNN编码器-解码器在维汉机器翻译中的应用[J]. 计算机工程与应用, 2018, 54(15): 235-240.
[2]	徐珺1，李明霞2，刘保相2. 基于区间概念格的规则优化方法与应用[J]. 计算机工程与应用, 2017, 53(13): 167-173.
[3]	刘颖，姜巍. 统计机器翻译中翻译规则抽取[J]. 计算机工程与应用, 2012, 48(32): 98-101.
[4]	王丽，韩习武. 双语词典在统计机器翻译中的应用[J]. 计算机工程与应用, 2010, 46(16): 135-139.
[5]	王斯日古楞^1，2，斯琴图³，那顺乌日图². 基于短语的汉蒙统计机器翻译研究[J]. 计算机工程与应用, 2010, 46(14): 138-142.
[6]	孙广范，宋金平，肖健，袁琦. 句法调序的统计机器翻译方法研究[J]. 计算机工程与应用, 2009, 45(36): 142-144.
[7]	强静^1,2,张建¹. 基于短语的统计机器翻译中短语抽取算法改进[J]. 计算机工程与应用, 2008, 44(13): 147-149.
[8]	陈建良朱伟兴. 蚁群算法优化模糊规则[J]. 计算机工程与应用, 2007, 43(5期): 113-115.
[9]	罗毅,李淼,朱鉴,胡冠龙. 基于短语统计机器翻译解码算法的研究与实现[J]. 计算机工程与应用, 2007, 43(30): 171-173.