Uyghur noun stemming system based on hybrid method

Computer Engineering and Applications ›› 2013, Vol. 49 ›› Issue (1): 171-175.

Previous Articles Next Articles

Uyghur noun stemming system based on hybrid method

ZAOKERE Kadeer1，2, AISHAN Wumaier1，2, TUERGEN Yibulayin1，2, PARIDA Tursun2，3, WU Xiaochuan1，2

1.School of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
2.Xinjiang Laboratory of Multi-language Information Technology, Urumqi 830046, China
3.School of Software, Xinjiang University, Urumqi 830046, China

Online:2013-01-01 Published:2013-01-16

混合策略的维吾尔语名词词干提取系统

早克热·卡德尔1，2，艾山·吾买尔1，2，吐尔根·依布拉音1，2，帕里旦·吐尔逊2，3，吴小川1，2

1.新疆大学信息科学与工程学院，乌鲁木齐 830046
2.新疆多语种信息技术重点实验室，乌鲁木齐 830046
3.新疆大学软件学院，乌鲁木齐 830046

Abstract

Abstract: This paper researches on Uyghur noun stemming. Uyghur noun morphology has been studied, and generates Finite State Machine（FSM）. The errors of FSM is studied. And according to the features of the errors, the FSM integrates with maximum entropy model to disambiguate the ambiguous suffixes. Finally, the noisy channel model is used to resolve the vowel neutralization. After establishing these three models, a rule and statistics based stemming method is proposed. In order to effectively make use of existing resources and improve system performance, dictionary-based approach is also integrated into the Uyghur noun stemming system. Thus, the system has a better performance and robustness, and the precision keeps over 95%.

Key words: Uyghur, agglutinative, Finite State Machine（FSM）, noisy channe, stemming

摘要： 通过对维吾尔语名词形态结构进行研究，构造了名词有限状态自动机（FSM）；针对自动机的缺陷使用最大熵模型给有限状态自动机加入了歧义词缀识别能力，根据维吾尔语的元音和谐特点，建立了基于规则和信道噪声模型的元音和谐处理方法。有机地结合以上三种方法构造出了基于规则和统计的名词词干提取方法。为了有效利用现有的资源，提高系统的性能，把基于词典的词干提取方法与规则和统计结合的名词词干提取方法相结合，从而开发出多种策略相结合的维吾尔语名词词干提取系统。该系统具有较强的鲁棒性，准确率保持95%以上。

关键词: 维吾尔语, 黏着语, 有限状态自动机, 噪声信道, 词干提取

ZAOKERE Kadeer1，2, AISHAN Wumaier1，2, TUERGEN Yibulayin1，2, PARIDA Tursun2，3, WU Xiaochuan1，2. Uyghur noun stemming system based on hybrid method[J]. Computer Engineering and Applications, 2013, 49(1): 171-175.

早克热·卡德尔1，2，艾山·吾买尔1，2，吐尔根·依布拉音1，2，帕里旦·吐尔逊2，3，吴小川1，2. 混合策略的维吾尔语名词词干提取系统[J]. 计算机工程与应用, 2013, 49(1): 171-175.

[1]	WANG Di, LI Caihong, GUO Na, LIU Guoming, GAO Tengteng. Local Path Planning of Mobile Robot Based on Fuzzy Potential Field Method [J]. Computer Engineering and Applications, 2021, 57(6): 212-218.
[2]	Hasan Wumaier, Sirajahmat Ruzmamat, Xireaili Hairela, LIU Wenqi, Tuergen Yibulayin, WANG Liejun, Wayit Abulizi. Bi-directional Uyghur-Chinese Neural Machine Translation with Marked Syllables [J]. Computer Engineering and Applications, 2021, 57(4): 161-168.
[3]	LIU Chang, Abudukelimu·Abulizi, YAO Dengfeng, Halidanmu·Abudukelimu. Survey for Uyghur Morphological Analysis [J]. Computer Engineering and Applications, 2021, 57(15): 42-61.
[4]	Ahmatjan Mattohti, Askar Hamdulla, Abdusalam Dawut. Uyghur Text Regions Localization Using Channel-Enhanced MSER and CNN [J]. Computer Engineering and Applications, 2020, 56(16): 132-138.
[5]	XU Xuebin, Hornisa Mamat, Alim Aysa, ZHU Yali, Kurban Ubul. Word Segmentation of Uyghur Image Based on Clustering and Conjoined Segment Identification [J]. Computer Engineering and Applications, 2020, 56(14): 148-155.
[6]	Yibulayin·Wusiman, GUO Wenqiang, YU Kai. Research on Filtering Algorithm for Senstive Information in Multi-form Uyghur [J]. Computer Engineering and Applications, 2020, 56(10): 127-133.
[7]	AYSADET·Abliz, HOJAHMAT·Ismayil, KAMIL·Muyidin, ASKAR·Hamdulla. Word extraction from Uyghur handwritten documents [J]. Computer Engineering and Applications, 2018, 54(9): 133-138.
[8]	XUE Pengqiang, XIAN Ying, Nurbol, Wushour Silamu. Sensitive information filtering algorithm based on Uyghur text information network research [J]. Computer Engineering and Applications, 2018, 54(5): 236-241.
[9]	Yibulayin·WUSIMAN1, ZHANG Shaowu2, YU Kai1. Research and implementation of converting mechanism of multiple characters Uyghur on the Internet [J]. Computer Engineering and Applications, 2018, 54(19): 114-121.
[10]	MUHETAER Palidan, SILAMU Wushouer, Maimaitayifu, YOULUWASI Nuermaimaiti. Application of RNN encoder-decoder in Uyghur-Chinese machine translation [J]. Computer Engineering and Applications, 2018, 54(15): 235-240.
[11]	Guljamal Mamateli1, Askar rozi2, Askar Hamdulla3. Uyghur prosodic boundary prediction based on hierarchical feature template selection [J]. Computer Engineering and Applications, 2017, 53(8): 250-253.
[12]	JIANG Wen，LIU Likang. Recognition of handwritten Uyghur character based on combination of two features [J]. Computer Engineering and Applications, 2017, 53(5): 192-196.
[13]	NIAN Mei1, FAN Zukui2, LIU Ruolan1. Study on construction of emotional dictionary of Uyghur language [J]. Computer Engineering and Applications, 2017, 53(4): 152-155.
[14]	XU Chun1，2，3, YANG Yong4, JIANG Tonghai1. Research on machine translation based Uyghur morphological analysis [J]. Computer Engineering and Applications, 2017, 53(14): 138-142.
[15]	LI Donghui, ZHANG Bin, FEI Xiaofei. Attribute access control model based on OSBE [J]. Computer Engineering and Applications, 2015, 51(7): 84-87.

Uyghur noun stemming system based on hybrid method

混合策略的维吾尔语名词词干提取系统

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics