计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (13): 147-149.

• 数据库、信号与信息处理 • 上一篇    下一篇

基于短语的统计机器翻译中短语抽取算法改进

强 静1,2,张 建1   

  1. 1.中国科学院 合肥智能机械研究所,合肥 230031
    2.中国科学技术大学 信息科学技术学院,合肥 230027
  • 收稿日期:2007-08-21 修回日期:2007-11-15 出版日期:2008-05-01 发布日期:2008-05-01
  • 通讯作者: 强 静

Improving phrase-based statistical translation by modifying phrase extraction algorithm

QIANG Jing1,2,ZHANG Jian1   

  1. 1.Institute of Intelligent Machines,Chinese Academy of Sciences,Hefei 230031,China
    2.School of Information Science and Technology,University of Science and Technology of China,Hefei 230027,China
  • Received:2007-08-21 Revised:2007-11-15 Online:2008-05-01 Published:2008-05-01
  • Contact: QIANG Jing

摘要: 针对基于短语统计机器翻译中目前常用的Och提出的短语抽取算法,提出了一种改进算法。该算法能够在原有算法的基础上抽取出更多的准确对齐信息,这对语料库较小的汉民统计机器来说意义重大,增加正确的对齐信息可以减少未登录词的产生,提高翻译正确率。经过对不同规模语料库的实验,抽取的短语对数目有明显增多。

关键词: 统计机器翻译, 翻译模型, 短语抽取

Abstract: The paper proposes an improved algorithm of Phrase Extract based on the Och’s phrase extraction algorithm in the phrase based statistical machine translation.The algorithm can take more accurate alignment information based on the original algorithm.It is of great significance for the smaller corpus statistical machinery.It can reduce the unknown words by increasing in correct alignment information,and increases the rate of correct translation.After the different scale corpus experiment.The extracted number of phrase is obviously increase.

Key words: machine translation, translation model, phrase extract