Hybrid algorithm of polyphonic word disambiguation in Uyghur language

Computer Engineering and Applications ›› 2011, Vol. 47 ›› Issue (35): 158-160.

• 数据库、信号与信息处理 • Previous Articles Next Articles

Hybrid algorithm of polyphonic word disambiguation in Uyghur language

Guljamal Mamateli1，Askar Rozi2，Askar Hamdulla1

1.Institute of Information Science and Engineering，Xinjiang University，Urumqi 830046，China
2.Institute of Mathematics and System Science，Xinjiang University，Urumqi 830046，China

Received:1900-01-01 Revised:1900-01-01 Online:2011-12-11 Published:2011-12-11

维吾尔语多音词消歧混合方法

姑丽加玛丽·麦麦提艾力1，艾斯卡尔·肉孜2，艾斯卡尔·艾木都拉1

1.新疆大学信息科学与工程学院，乌鲁木齐 830046
2.新疆大学数学与系统科学学院，乌鲁木齐 830046

Abstract

Abstract: The correct pronunciation of polyphonic word is one of the important factors that affect the Uyghur speech synthesis intelligibility.A word consists of stem and affix in Uyghur language，although there is a few polyphone stems，but a large number of polyphonic words are constituted by jointing of affix and polyphonic stem.This paper selects 16 polyphonic stems which are frequently used and often read wrong in Uyghur language to study，presents a different rule based method and adopts the maximum entropy model for disambiguation of polyphonic words which does not meet the rules on the basis of the different features of polyphones.Simultaneously，log-likelihood ratio is used to extract keywords and greedy algorithm is used to select best feature set.The performance test of the algorithm shows that the average precision of polyphonic word disambiguation is up to 87.7%.

Key words: Uyghur language, polyphonic word, maximum entropy model

摘要： 维吾尔语中存在的形同音不同单词（多音词）的正确发音是影响合成系统可懂读的重要原因之一。维吾尔语单词由词根和词缀构成，虽然多音词词根数量不多，但多音词词根连接各种词缀则构成了大量的多音词。将维吾尔语中经常用错的16个多音词词根作为研究对象，以多音词的不同特点为出发点，采取不同的规则，结合最大熵模型方法来处理不符规则的多音词，同时用似然比方法选取关键词，并用贪婪算法选择最佳特征模板。经过性能测试，该算法多音词消歧平均准确率达到87.7%。

关键词: 维吾尔语, 多音词, 最大熵模型

Guljamal Mamateli1，Askar Rozi2，Askar Hamdulla1. Hybrid algorithm of polyphonic word disambiguation in Uyghur language[J]. Computer Engineering and Applications, 2011, 47(35): 158-160.

姑丽加玛丽·麦麦提艾力1，艾斯卡尔·肉孜2，艾斯卡尔·艾木都拉1. 维吾尔语多音词消歧混合方法[J]. 计算机工程与应用, 2011, 47(35): 158-160.

[1]	Guljamal Mamateli1, Askar rozi2, Askar Hamdulla3. Uyghur prosodic boundary prediction based on hierarchical feature template selection [J]. Computer Engineering and Applications, 2017, 53(8): 250-253.
[2]	Alimjan AYSA1，3, Kurban UBUL2，3, Turgun IBRAHIM2，3. Bigram feature extraction for Uyghur text [J]. Computer Engineering and Applications, 2015, 51(3): 216-221.
[3]	Azragul1，2, LI Xiao1, Yusup ABAYDULLA2. Research of modern Uyghur language statistical analysis technology [J]. Computer Engineering and Applications, 2014, 50(3): 108-111.
[4]	Nurmemet YOLWAS, Wushour SILAMU. Research on large vocabulary continuous speech recognition for Uyghur [J]. Computer Engineering and Applications, 2013, 49(9): 115-119.
[5]	Mirigul ABDURSUL, Mijit ABLIMIT, Akbar PATTAR, Askar HAMDULLA. Research on technologies of HTK based Uyghur continuous phoneme recognition [J]. Computer Engineering and Applications, 2013, 49(22): 150-154.
[6]	Alimjan AYSA1，2, Turgun IBRAHIM2, Hasan OMAR2, Marhaba ALI2. Machine learning based Uyghur language text categorization [J]. Computer Engineering and Applications, 2012, 48(5): 110-112.
[7]	Guljamal Mamateli,Askar Ruzi,Askar Hamdulla. Uyghur sentence selection algorithm of thriphone model [J]. Computer Engineering and Applications, 2009, 45(18): 242-244.

Hybrid algorithm of polyphonic word disambiguation in Uyghur language

维吾尔语多音词消歧混合方法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 7

Recommended Articles

Metrics