计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (35): 147-149.

• 数据库、信号与信息处理 • 上一篇    下一篇

改进的自适应汉维句子对齐

田生伟1,禹 龙2,杨飞宇3   

  1. 1.新疆大学 软件学院,乌鲁木齐 830008
    2.新疆大学 网络中心,乌鲁木齐 830046
    3.新疆大学 国际文化交流学院,乌鲁木齐 830046
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-12-11 发布日期:2011-12-11

Improved adaptive algorithm for Chinese-Uyghur sentence alignment

TIAN Shengwei1,YU Long2,YANG Feiyu3   

  1. 1.School of Software,Xinjiang University,Urumqi 830008,China
    2.Network Center,Xinjiang University,Urumqi 830046,China
    3.International Cultural Exchange College,Xinjiang University,Urumqi 830046,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-12-11 Published:2011-12-11

摘要: 提出了改进的自适应汉维句子对齐算法对齐汉维语句子。针对传统对齐方法不能较好地适应语料类型的变化,算法利用当前待对齐汉维文本的字节长度比和历史匹配模式数据,动态修正对齐模型的参数,使其适应语料类型的变化,提高了汉维句子对齐算法的性能,对齐的正确率和召回率较长度对齐模型分别提高了3.5个百分点和2.7个百分点,较混合对齐提高了1.9个百分点和1.8个百分点。实验结果验证了该算法能够有效地适应语料类型的变化。

关键词: 双语语料, 句子对齐, 自适应

Abstract: This paper proposes an improved adaptive algorithm for Chinese-Uyghur sentence alignment.Traditional alignment methods can not well adapt to change in types of corpus,the algorithm makes ues of current Chinese-Uyghur text length ratio of bytes and historical matching model,modifies the alignment model parameters dynamically to meet the changes in types of corpus and improves sentence alignment algorithm performance.Compared with alignment algorithm based on length,alignment improves alignment accuarcy 3.5 percentage and recall 2.7 percentage,compared with mixed-aligned model,alignment improves 1.9 percentage and 1.8 percentage.Experimental results show that the algorithm can adapt to change in types of corpus well.

Key words: bilingual corpora, sentence alignment, adaptive