Computer Engineering and Applications ›› 2009, Vol. 45 ›› Issue (26): 144-146.DOI: 10.3778/j.issn.1002-8331.2009.26.042

• 数据库、信息处理 • Previous Articles     Next Articles

Similarity measure algorithm of Uyhur sentence

TIAN Sheng-wei1,Turgun Ibrahim1,YU Long2,Mahmut Muhammad1,Hasan Uma1   

  1. 1.Information Science and Engineering Technology Institute,Xinjiang University,Urumqi 830046,China
    2.Network Center,Xinjiang University,Urumqi 830046,China
  • Received:2008-06-03 Revised:2008-07-28 Online:2009-09-11 Published:2009-09-11
  • Contact: TIAN Sheng-wei

一种维吾尔语句子相似度算法的研究

田生伟1,吐尔根·依布拉音1,禹 龙2,买合木提·木合买提1,艾山·吾买尔1   

  1. 1.新疆大学 信息科学与工程学院,乌鲁木齐 830046
    2.新疆大学 网络中心,乌鲁木齐 830046
  • 通讯作者: 田生伟

Abstract: Example-Based Machine Translation(EBMT) is an important branch of machine translation.Sentence similarity measure certainly is one of the most important problems addressed in EBMT,the performance of similarity measure of Uyhur sentences affects directly final translation result of an input sentence.This paper proposes a similarity measure algorithm of Uyhur sentence. The retrieval approach of candidate translation examples based on word surface features and word inverted index can get candidate sentence set from the corpus rapidly,and multilayer similarity measure approach based on Uyhur word frequency and constant phrase can measure two Uyhur sentence effectively.The test results show that the algorithm works well.

Key words: machine translation, example-based machine translation, Uyhur sentence similarity

摘要: 基于实例的机器翻译是一种重要的机器翻译技术,句子相似度的衡量是基于实例机器翻译研究中最重要的一个内容。对于基于实例的维吾尔语机器翻译研究,维吾尔语句子相似度衡量的准确性,直接影响到最后翻译结果的输出。提出了一种维吾尔语句子相似度的计算方法,采用的基于词形特征的粗选算法、散列单词倒排索引能够有效提高算法的查找速度,快速从语料库中筛选出候选句子集合;多策略精选算法中采用基于维吾尔语词频的单词区分度算法、连续单词序列抽取算法,可以有效衡量两个维吾尔语句子的相似程度,实验结果证明算法是有效的。

关键词: 机器翻译, 基于实例机器翻译, 维吾尔语句子相似度

CLC Number: