Similarity measure algorithm of Uyhur sentence

doi:10.3778/j.issn.1002-8331.2009.26.042

Computer Engineering and Applications ›› 2009, Vol. 45 ›› Issue (26): 144-146.DOI: 10.3778/j.issn.1002-8331.2009.26.042

• 数据库、信息处理 • Previous Articles Next Articles

Similarity measure algorithm of Uyhur sentence

TIAN Sheng-wei¹，Turgun Ibrahim¹，YU Long²，Mahmut Muhammad¹，Hasan Uma¹

1.Information Science and Engineering Technology Institute，Xinjiang University，Urumqi 830046，China
2.Network Center，Xinjiang University，Urumqi 830046，China

Received:2008-06-03 Revised:2008-07-28 Online:2009-09-11 Published:2009-09-11
Contact: TIAN Sheng-wei

一种维吾尔语句子相似度算法的研究

田生伟¹，吐尔根·依布拉音¹，禹龙²，买合木提·木合买提¹，艾山·吾买尔¹

1.新疆大学信息科学与工程学院，乌鲁木齐 830046
2.新疆大学网络中心，乌鲁木齐 830046

通讯作者: 田生伟

Abstract

Abstract: Example-Based Machine Translation（EBMT） is an important branch of machine translation.Sentence similarity measure certainly is one of the most important problems addressed in EBMT，the performance of similarity measure of Uyhur sentences affects directly final translation result of an input sentence.This paper proposes a similarity measure algorithm of Uyhur sentence. The retrieval approach of candidate translation examples based on word surface features and word inverted index can get candidate sentence set from the corpus rapidly，and multilayer similarity measure approach based on Uyhur word frequency and constant phrase can measure two Uyhur sentence effectively.The test results show that the algorithm works well.

Key words: machine translation, example-based machine translation, Uyhur sentence similarity

摘要： 基于实例的机器翻译是一种重要的机器翻译技术，句子相似度的衡量是基于实例机器翻译研究中最重要的一个内容。对于基于实例的维吾尔语机器翻译研究，维吾尔语句子相似度衡量的准确性，直接影响到最后翻译结果的输出。提出了一种维吾尔语句子相似度的计算方法，采用的基于词形特征的粗选算法、散列单词倒排索引能够有效提高算法的查找速度，快速从语料库中筛选出候选句子集合；多策略精选算法中采用基于维吾尔语词频的单词区分度算法、连续单词序列抽取算法，可以有效衡量两个维吾尔语句子的相似程度，实验结果证明算法是有效的。

关键词: 机器翻译, 基于实例机器翻译, 维吾尔语句子相似度

CLC Number:

TP391

TIAN Sheng-wei¹，Turgun Ibrahim¹，YU Long²，Mahmut Muhammad¹，Hasan Uma¹. Similarity measure algorithm of Uyhur sentence[J]. Computer Engineering and Applications, 2009, 45(26): 144-146.

田生伟¹，吐尔根·依布拉音¹，禹龙²，买合木提·木合买提¹，艾山·吾买尔¹. 一种维吾尔语句子相似度算法的研究[J]. 计算机工程与应用, 2009, 45(26): 144-146.

[1]	Hasan Wumaier, Sirajahmat Ruzmamat, Xireaili Hairela, LIU Wenqi, Tuergen Yibulayin, WANG Liejun, Wayit Abulizi. Bi-directional Uyghur-Chinese Neural Machine Translation with Marked Syllables [J]. Computer Engineering and Applications, 2021, 57(4): 161-168.
[2]	MENG Fuyong, TANG Xuri. Efficiency First：Reviewing Technologies of Machine Translation Post-Editing [J]. Computer Engineering and Applications, 2020, 56(22): 25-32.
[3]	HOU Qiang1, HOU Ruili2. Review of Studies and Developments on Machine Translation Methodology [J]. Computer Engineering and Applications, 2019, 55(10): 30-35.
[4]	MUHETAER Palidan, SILAMU Wushouer, Maimaitayifu, YOULUWASI Nuermaimaiti. Application of RNN encoder-decoder in Uyghur-Chinese machine translation [J]. Computer Engineering and Applications, 2018, 54(15): 235-240.
[5]	XU Chun1，2，3, YANG Yong4, JIANG Tonghai1. Research on machine translation based Uyghur morphological analysis [J]. Computer Engineering and Applications, 2017, 53(14): 138-142.
[6]	GUO Junbo1, ZHANG Xiyuan2, DU Jinhua2. N-best syntactic knowledge enhanced pre-reordering model for statistical machine translation [J]. Computer Engineering and Applications, 2016, 52(17): 160-165.
[7]	WU Peihao, XU Jin’an, ZHANG Yujie. Research on joint Chinese-Japanese word segmentation for phrase-based statistical machine translation [J]. Computer Engineering and Applications, 2015, 51(5): 116-120.
[8]	HE Xiangzhen1, WAN Fucheng1, YU Hongzhi1, WU Xihong2. Machine translation technology based on Tibetan semantic parsing [J]. Computer Engineering and Applications, 2015, 51(15): 134-137.
[9]	WAN Fucheng1, YU Hongzhi1, WU Xihong2, HE Xiangzhen1. Research of Tibetan syntax for machine translation [J]. Computer Engineering and Applications, 2015, 51(13): 211-215.
[10]	LI Hongzheng, ZHU Yun, JIN Yaohong. Automatic identification of verb-preposition multi-category words for Chinese-English patent machine translation [J]. Computer Engineering and Applications, 2015, 51(11): 6-11.
[11]	LIU Zhiying1，2，GUO Yanbo3，JIN Yaohong1，2. Format conversion in Chinese-English machine translation [J]. Computer Engineering and Applications, 2014, 50(6): 192-196.
[12]	JIN Yaohong1，2. Hybrid-strategy method combining semantic analysis with rule-based MT for patent machine translation [J]. Computer Engineering and Applications, 2012, 48(4): 29-32.
[13]	LIU Ying, JIANG Wei. Translation rules extraction for statistical machine translation [J]. Computer Engineering and Applications, 2012, 48(32): 98-101.
[14]	WANG Yong-sheng. Research on part-of-speech tagging using decision trees in English-Chinese machine translation system [J]. Computer Engineering and Applications, 2010, 46(20): 99-102.
[15]	WANG Li，HAN Xi-wu. Application of bilingual dictionary in statistical machine translation [J]. Computer Engineering and Applications, 2010, 46(16): 135-139.

Similarity measure algorithm of Uyhur sentence

一种维吾尔语句子相似度算法的研究

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics