维吾尔语多音词消歧混合方法

计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (35): 158-160.

• 数据库、信号与信息处理 • 上一篇下一篇

维吾尔语多音词消歧混合方法

姑丽加玛丽·麦麦提艾力1，艾斯卡尔·肉孜2，艾斯卡尔·艾木都拉1

1.新疆大学信息科学与工程学院，乌鲁木齐 830046
2.新疆大学数学与系统科学学院，乌鲁木齐 830046

收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-12-11 发布日期:2011-12-11

Hybrid algorithm of polyphonic word disambiguation in Uyghur language

Guljamal Mamateli1，Askar Rozi2，Askar Hamdulla1

1.Institute of Information Science and Engineering，Xinjiang University，Urumqi 830046，China
2.Institute of Mathematics and System Science，Xinjiang University，Urumqi 830046，China

Received:1900-01-01 Revised:1900-01-01 Online:2011-12-11 Published:2011-12-11

摘要/Abstract

摘要： 维吾尔语中存在的形同音不同单词（多音词）的正确发音是影响合成系统可懂读的重要原因之一。维吾尔语单词由词根和词缀构成，虽然多音词词根数量不多，但多音词词根连接各种词缀则构成了大量的多音词。将维吾尔语中经常用错的16个多音词词根作为研究对象，以多音词的不同特点为出发点，采取不同的规则，结合最大熵模型方法来处理不符规则的多音词，同时用似然比方法选取关键词，并用贪婪算法选择最佳特征模板。经过性能测试，该算法多音词消歧平均准确率达到87.7%。

关键词: 维吾尔语, 多音词, 最大熵模型

Abstract: The correct pronunciation of polyphonic word is one of the important factors that affect the Uyghur speech synthesis intelligibility.A word consists of stem and affix in Uyghur language，although there is a few polyphone stems，but a large number of polyphonic words are constituted by jointing of affix and polyphonic stem.This paper selects 16 polyphonic stems which are frequently used and often read wrong in Uyghur language to study，presents a different rule based method and adopts the maximum entropy model for disambiguation of polyphonic words which does not meet the rules on the basis of the different features of polyphones.Simultaneously，log-likelihood ratio is used to extract keywords and greedy algorithm is used to select best feature set.The performance test of the algorithm shows that the average precision of polyphonic word disambiguation is up to 87.7%.

Key words: Uyghur language, polyphonic word, maximum entropy model

姑丽加玛丽·麦麦提艾力1，艾斯卡尔·肉孜2，艾斯卡尔·艾木都拉1. 维吾尔语多音词消歧混合方法[J]. 计算机工程与应用, 2011, 47(35): 158-160.

Guljamal Mamateli1，Askar Rozi2，Askar Hamdulla1. Hybrid algorithm of polyphonic word disambiguation in Uyghur language[J]. Computer Engineering and Applications, 2011, 47(35): 158-160.

[1]	刘畅，阿布都克力木·阿布力孜，姚登峰，哈里旦木·阿布都克里木. 维吾尔语形态分析研究综述[J]. 计算机工程与应用, 2021, 57(15): 42-61.
[2]	阿里甫·库尔班1，艾山江·亚生2，张丹丹2. 维吾尔语KP_V句型的文法手语编辑系统的设计[J]. 计算机工程与应用, 2019, 55(7): 248-252.
[3]	夏吾吉1，2，华却才让1. 基于混合策略的藏文人称代词指代消解研究[J]. 计算机工程与应用, 2018, 54(7): 66-69.
[4]	帕丽旦·木合塔尔，吾守尔·斯拉木，买买提阿依甫，努尔麦麦提·尤鲁瓦斯. RNN编码器-解码器在维汉机器翻译中的应用[J]. 计算机工程与应用, 2018, 54(15): 235-240.
[5]	姑丽加玛丽·麦麦提艾力1，艾斯卡尔·肉孜2，艾斯卡尔·艾木都拉3. 分层特征模板筛选的维吾尔语韵律边界预测[J]. 计算机工程与应用, 2017, 53(8): 250-253.
[6]	年梅1，范祖奎2，刘若兰1. 维吾尔语褒贬情感词典构建研究[J]. 计算机工程与应用, 2017, 53(4): 152-155.
[7]	徐春1，2，3，杨勇4，蒋同海1. 基于机器翻译的维吾尔语形态分析研究[J]. 计算机工程与应用, 2017, 53(14): 138-142.
[8]	刘颖，王楠. 最大熵模型和BP神经网络的短句对齐比较[J]. 计算机工程与应用, 2015, 51(7): 112-117.
[9]	阿力木江·艾沙1，3，库尔班·吾布力2，3，吐尔根·依布拉音2，3. 维吾尔文Bigram文本特征提取[J]. 计算机工程与应用, 2015, 51(3): 216-221.
[10]	古丽扎达·海沙1，古丽拉·阿东别克2，3. 哈萨克语动词短语自动识别研究与实现[J]. 计算机工程与应用, 2015, 51(2): 218-223.
[11]	金惠琴，努尔麦麦提·尤鲁瓦斯，吾守尔·斯拉木，王辉. 维吾尔语的重音检测[J]. 计算机工程与应用, 2014, 50(9): 197-199.
[12]	阿力木·木拉提，艾孜尔古丽，玉素甫·艾白都拉. 现代维吾尔语人名汉字音译转写关键技术研究[J]. 计算机工程与应用, 2014, 50(9): 209-213.
[13]	麦热哈巴·艾力1，2，阿孜古丽·夏力甫3，吐尔根·依布拉音1，2. 维吾尔语多词表达抽取方法研究[J]. 计算机工程与应用, 2014, 50(8): 26-30.
[14]	艾孜尔古丽1，2，李晓1，玉素甫·艾白都拉2. 中小学维吾尔语文教材用词数据分析方法研究[J]. 计算机工程与应用, 2014, 50(3): 108-111.
[15]	努尔麦麦提·尤鲁瓦斯，吾守尔·斯拉木. 面向大词汇量的维吾尔语连续语音识别研究[J]. 计算机工程与应用, 2013, 49(9): 115-119.

维吾尔语多音词消歧混合方法

Hybrid algorithm of polyphonic word disambiguation in Uyghur language

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics