Computer Engineering and Applications ›› 2012, Vol. 48 ›› Issue (32): 98-101.

Previous Articles     Next Articles

Translation rules extraction for statistical machine translation

LIU Ying, JIANG Wei   

  1. Department of Chinese Language and Literature, Tsinghua University, Beijing 100084, China
  • Online:2012-11-11 Published:2012-11-20

统计机器翻译中翻译规则抽取

刘  颖,姜  巍   

  1. 清华大学 中文系,北京 100084

Abstract: Aligned phrases are important for Statistical Machine Translation(SMT). Hierarchical phrase model based on phrase tree is provided, which integrates the advantages of string-tree model and hierarchical phrase model. Translation rules are extracted according to aligned phrases and English phrase trees, and heuristic strategies are proposed for determining corresponding syntax labels of new translation rules. Translation quality of SMT using translation rules is better than those of phrase model and hierarchical phrase model. The BLEU score of the model is higher than phrase model and hierarchical phrase model.

Key words: statistical machine translation, translation rules, extraction, filtration, BLEU

摘要: 对齐短语是决定统计机器翻译系统质量的核心模块。提出基于短语结构树的层次短语模型,这是利用串-树模型的思想对层次短语模型的扩展。基于短语结构树的层次短语模型是在双语对齐短语的基础之上结合英语短语结构树抽取翻译规则,并利用启发式策略获得翻译规则的扩展句法标记。采用翻译规则的统计机器翻译系统在不同数据集上具有稳定的翻译结果,在训练集和测试集的平均BlEU评分高于短语模型和层次短语模型的BLEU评分。

关键词: 统计机器翻译, 翻译规则, 抽取, 过滤, BLEU评分