Study and implementation of Kazakh lexical scanner

Computer Engineering and Applications ›› 2008, Vol. 44 ›› Issue (19): 146-149.

• 数据库、信号与信息处理 • Previous Articles Next Articles

Study and implementation of Kazakh lexical scanner

DAWEL Abilhaye,GULILA Altenbek

College of Information Science and Engineering，Xinjiang University，Urumqi 830046，China

Received:2007-10-29 Revised:2008-02-21 Online:2008-07-01 Published:2008-07-01
Contact: DAWEL Abilhaye

哈萨克语词法分析器的研究与实现

达吾勒·阿布都哈依尔,古丽拉·阿东别克

新疆大学信息科学与工程学院，乌鲁木齐 830046

通讯作者: 达吾勒·阿布都哈依尔

Abstract

Abstract: This paper studies the problems of stem and affix segmentation in Kazakh automatic morphological analysis and develops a system called “KazStemmer”，which can automatically carry out the stem segmentation and tagging processes for Kazakh corpora.In this paper，the authors first use FSM to analyze the stemming words.IF the FSM does not work，then the combination of the bidirectional matching algorithm，omni-word segmentation algorithm and morphological analysis is used to implement the segmentation of stems and word affixes.Compared to the maximum matching algorithm，this method can get higher precision and processing speed.In addition，the authors use the improved binary-seek-by-character dictionary query mechanism.Its performance also influences the segmentation speed significantly.

Key words: affixes segmentation, FSM, bidirectional matching algorithm, omni-word segmentation algorithm

摘要： 研究了哈萨克语自动词法分析中的附加成分的切分和词干提取问题，并实现了哈萨克语词法分析系统KazStemmer。系统首先对待切分词使用有限状态自动机进行分析。如果成功则将输出作为切分结果，否则再使用双向全切分和词法分析相结合的改进方法来进行切分。与最大匹配法相比，该方法提高了词干提取的正确率和切分速度。同时，在词干表的搜索中首次采用了改进的逐字母二分词典查询机制来提高了词干提取的效率。

关键词: 附加成分切分, 有限状态自动机, 双向匹配, 全切分

DAWEL Abilhaye,GULILA Altenbek. Study and implementation of Kazakh lexical scanner[J]. Computer Engineering and Applications, 2008, 44(19): 146-149.

达吾勒·阿布都哈依尔,古丽拉·阿东别克

. 哈萨克语词法分析器的研究与实现[J]. 计算机工程与应用, 2008, 44(19): 146-149.

[1]	WANG Di, LI Caihong, GUO Na, LIU Guoming, GAO Tengteng. Local Path Planning of Mobile Robot Based on Fuzzy Potential Field Method [J]. Computer Engineering and Applications, 2021, 57(6): 212-218.
[2]	ZENG Yuan1, 2, YANG Haidong1, CHEN Haiyong2. Simplified measurement method based on IFPUG Function Point Analysis [J]. Computer Engineering and Applications, 2017, 53(7): 60-63.
[3]	LI Donghui, ZHANG Bin, FEI Xiaofei. Attribute access control model based on OSBE [J]. Computer Engineering and Applications, 2015, 51(7): 84-87.
[4]	YOU Feng, BIAN Yi, ZHAO Ruilian. Automatic string test data generation for EFSM model [J]. Computer Engineering and Applications, 2014, 50(16): 57-61.
[5]	ZAOKERE Kadeer1，2, AISHAN Wumaier1，2, TUERGEN Yibulayin1，2, PARIDA Tursun2，3, WU Xiaochuan1，2. Uyghur noun stemming system based on hybrid method [J]. Computer Engineering and Applications, 2013, 49(1): 171-175.
[6]	MA Junliang, WANG Xili, HE Juhou, XIAO Bing. Research and design of enhanced Anti-Xprobe2 [J]. Computer Engineering and Applications, 2012, 48(32): 1-4.
[7]	YOU Feng，YAN Yu，ZHAO Ruilian. Test data generation for EFSM models involving procedure call [J]. Computer Engineering and Applications, 2011, 47(32): 87-90.
[8]	ZENG Juling. AMC/ARQ cross-layer design and its optimizing for OFDM [J]. Computer Engineering and Applications, 2011, 47(30): 105-108.
[9]	YANG Tao¹，DENG Hongli²，SHAO Chenxi³. Simulation of cellular signal transduction network based on Agent and FSM [J]. Computer Engineering and Applications, 2011, 47(3): 221-224.
[10]	ZHANG Xiuping，YANG Guowu，LI Xiaoyu. Emulation model for MMU coprocessor [J]. Computer Engineering and Applications, 2011, 47(3): 57-60.
[11]	TIAN Yuan，LI Jianbin，ZHANG Zhen. Effective approach to re-engineering of protocol’s state machine model [J]. Computer Engineering and Applications, 2011, 47(19): 63-67.
[12]	CHEN Tao，PAN Xue-zeng，CHEN Jian，CHEN Xiao-ping，LU Kui-jun. Study of protocol conformance test sequence generation algorithm based on FSM [J]. Computer Engineering and Applications, 2010, 46(6): 60-62.
[13]	YANG Shi-yu¹，XU Zhong-wei¹，YU Gang^1，2，ZHANG Sheng¹. Conformance test sequence generation for railway signal safety protocol [J]. Computer Engineering and Applications, 2010, 46(34): 59-61.
[14]	WANG Xiang-yun¹,ZHAO Lei²,CAI Kai-yuan². Design approach for EFSM based on supervisory control theory [J]. Computer Engineering and Applications, 2009, 45(6): 20-24.
[15]	YANG Qing^1,2,LI Fang-min¹. Study on model of intrusion detection for wireless Ad Hoc network based on SVM and FSM [J]. Computer Engineering and Applications, 2009, 45(13): 114-118.

Study and implementation of Kazakh lexical scanner

哈萨克语词法分析器的研究与实现

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics