计算机工程与应用 ›› 2010, Vol. 46 ›› Issue (34): 143-145.DOI: 10.3778/j.issn.1002-8331.2010.34.043

• 数据库、信号与信息处理 • 上一篇    下一篇

混合策略的汉维句子对齐

田生伟1,吐尔根·依布拉音1,禹 龙2   

  1. 1.新疆大学 信息科学与工程学院,乌鲁木齐 830046
    2.新疆大学 网络中心,乌鲁木齐 830046
  • 收稿日期:2009-04-10 修回日期:2009-06-05 出版日期:2010-12-01 发布日期:2010-12-01
  • 通讯作者: 田生伟

Chinese-Uyhur sentence alignment based on hybrid strategy

TIAN Sheng-wei1,TURGUN Ibrahim1,YU Long2   

  1. 1.Information Science and Engineering Technology Institute,Xinjiang University,Urumqi 830046,China
    2.Network Center,Xinjiang University,Urumqi 830046,China
  • Received:2009-04-10 Revised:2009-06-05 Online:2010-12-01 Published:2010-12-01
  • Contact: TIAN Sheng-wei

摘要: 提出了一种混合算法对齐汉维句子,不需要汉语分词、词性标注预处理,利用双语语料的词汇共现信息,自动抽取汉维语词汇搭配,作为基于词汇对齐的词典,并结合基于长度的方法进行句子对齐,实验结果验证了该混合算法的有效性,汉维语句子对齐的正确率和召回率,达到了97.5%和97.1%。

Abstract: This paper proposes a new approach to align Chinese-Uyhur sentences in the parallel texts.This approach avoids complicated Chinese processing further,such as segmentation and part of speech tagging.The lexical correspondence information is extracted from the bilingual corpora and used as the lexicon of lexicon-method model,combined with length-based approach,the hybrid approach improves the alignment accuracy and recall,and gets an encouraging 97.5% precision and 97.1% recall.

中图分类号: