Computer Engineering and Applications ›› 2010, Vol. 46 ›› Issue (15): 118-120.DOI: 10.3778/j.issn.1002-8331.2010.15.035

• 数据库、信号与信息处理 • Previous Articles     Next Articles

Noisy channel based Uyghur neutralized vowel identification model

AISHAN Wumaier,TUERGEN Yibulayin,ZAOKERE Kadeer   

  1. School of Information Science and Engineering,Xinjiang University,Urumqi 830046,China
  • Received:2009-04-27 Revised:2009-06-18 Online:2010-05-21 Published:2010-05-21
  • Contact: AISHAN Wumaier

基于噪声信道的维吾尔语央音原音识别模型

艾山·吾买尔,吐尔根·依步拉音,早克热·卡德尔   

  1. 新疆大学 信息科学与工程学院,乌鲁木齐 830046
  • 通讯作者: 艾山·吾买尔

Abstract: In Uyghur,an inflectional suffix added to a word always produces vowel neutralization.When stemming an inflected word,the rule based neutralized vowel detecting has a low precision about 40%.For this problem,the noisy channel based Uyghur neutralized vowel identification model is proposed.The language model and likelihood build on the word ending two letters,three letters and last syllable.In the test,the model’s precision reached 82.45%,this model can improve stemming precision over 15%.

Key words: noisy channel, Uyghur, vowel harmony, stemming, neutralized vowel

摘要: 维吾尔语单词连接构形词缀时,经常发生元音弱化成央音的现象。但对已有形态变化的单词进行形态还原时,使用规则识别弱化央音的原音的效率一般在40%左右。提出基于噪声信道的维吾尔语央音原音识别模型。该模型以弱化词干词尾的二字符、三字符和最后音节作为上下文,建立语言模型和似然度计算公式。在开放测试中,模型的准确率达到82.45%,提高词干提取准确率15%。

关键词: 噪声信道, 维吾尔语, 元音弱化, 词干提取, 央音

CLC Number: