Computer Engineering and Applications ›› 2016, Vol. 52 ›› Issue (17): 160-165.

Previous Articles     Next Articles

N-best syntactic knowledge enhanced pre-reordering model for statistical machine translation

GUO Junbo1, ZHANG Xiyuan2, DU Jinhua2   

  1. 1.Faculty of Higher Vocational and Technical Education, Xi’an University of Technology, Xi’an 710048, China
    2.Faculty of Automation and Information Engineering, Xi’an University of Technology, Xi’an 710048, China
  • Online:2016-09-01 Published:2016-09-14

N-Best句法知识增强的统计机器翻译预调序模型

郭俊博1,张喜媛2,杜金华2   

  1. 1.西安理工大学 高等技术学院,西安 710048
    2.西安理工大学 自动化与信息工程学院,西安 710048

Abstract: The syntactic heterogeneity between source and target languages has a significant impact on Statistical Machine Translation(SMT) performance. Based on the Chinese-English SMT system, an N-best syntactic knowledge enhanced method is proposed to pre-order the source-side sentences. Firstly, syntactic N-best parsed trees are generated, and highly reliable sub-trees are obtained by computing their posterior probabilities and then initial reordering rule set is extracted according to the word alignment links and sub-trees. Two optimization strategies are utilized to process the initial rule set, namely the bilingually syntactic knowledge-based and probability threshold-based. Secondly, in order to guarantee the local fluency of phrases, the phrase table is used to constrain the reordering only taking place between phrases rather than inside phrases. Finally, the optimized reordering rule set constrained by the phrase table is utilized to perform pre-reordering in source-side sentences. Experimental results on NIST 2005 and 2008 test sets show that the BLEU score improves 0.68 and 0.83 respectively compared to the baseline system.

Key words: statistical machine translation, pre-reordering model, N-best parsed tree, reordering rules, rule optimization

摘要: 源语言和目标语言的句法异构性对统计机器翻译(SMT)性能有重要影响。在基于短语的汉英统计机器翻译基础上,提出了一种基于N-best句法知识增强的源语言预调序方法。首先对源语言输入句子进行N-best句法分析,计算统计概率得到高可靠性子树结构,再根据词对齐信息从可靠性子树结构中抽取初始调序规则集。两种优化策略用于对初始规则集进行优化:基于中英文句法知识规则推导筛选和规则概率阈值控制机制。然后为减少短语内部调序,保证短语局部流利性,采用源语言短语翻译表为约束,使调序控制在短语块之间进行。最后根据获取的优化规则集和短语表约束条件对源语言端句子的句法分析树进行预调序。在基于NIST 2005和2008测试数据集上的汉英统计机器翻译实验结果表明,所提基于N-best句法知识增强的统计机器翻译预调序方法相对于基线系统,自动评价准则BLEU得分分别提高了0.68和0.83。

关键词: 统计机器翻译, 预调序模型, N-best句法树, 调序规则, 规则优化