Computer Engineering and Applications ›› 2011, Vol. 47 ›› Issue (35): 158-160.

• 数据库、信号与信息处理 • Previous Articles     Next Articles

Hybrid algorithm of polyphonic word disambiguation in Uyghur language

Guljamal Mamateli1,Askar Rozi2,Askar Hamdulla1   

  1. 1.Institute of Information Science and Engineering,Xinjiang University,Urumqi 830046,China
    2.Institute of Mathematics and System Science,Xinjiang University,Urumqi 830046,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-12-11 Published:2011-12-11

维吾尔语多音词消歧混合方法

姑丽加玛丽·麦麦提艾力1,艾斯卡尔·肉孜2,艾斯卡尔·艾木都拉1   

  1. 1.新疆大学 信息科学与工程学院,乌鲁木齐 830046
    2.新疆大学 数学与系统科学学院,乌鲁木齐 830046

Abstract: The correct pronunciation of polyphonic word is one of the important factors that affect the Uyghur speech synthesis intelligibility.A word consists of stem and affix in Uyghur language,although there is a few polyphone stems,but a large number of polyphonic words are constituted by jointing of affix and polyphonic stem.This paper selects 16 polyphonic stems which are frequently used and often read wrong in Uyghur language to study,presents a different rule based method and adopts the maximum entropy model for disambiguation of polyphonic words which does not meet the rules on the basis of the different features of polyphones.Simultaneously,log-likelihood ratio is used to extract keywords and greedy algorithm is used to select best feature set.The performance test of the algorithm shows that the average precision of polyphonic word disambiguation is up to 87.7%.

Key words: Uyghur language, polyphonic word, maximum entropy model

摘要: 维吾尔语中存在的形同音不同单词(多音词)的正确发音是影响合成系统可懂读的重要原因之一。维吾尔语单词由词根和词缀构成,虽然多音词词根数量不多,但多音词词根连接各种词缀则构成了大量的多音词。将维吾尔语中经常用错的16个多音词词根作为研究对象,以多音词的不同特点为出发点,采取不同的规则,结合最大熵模型方法来处理不符规则的多音词,同时用似然比方法选取关键词,并用贪婪算法选择最佳特征模板。经过性能测试,该算法多音词消歧平均准确率达到87.7%。

关键词: 维吾尔语, 多音词, 最大熵模型