哈萨克语词法分析器的研究与实现

计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (19): 146-149.

• 数据库、信号与信息处理 • 上一篇下一篇

哈萨克语词法分析器的研究与实现

达吾勒·阿布都哈依尔,古丽拉·阿东别克

新疆大学信息科学与工程学院，乌鲁木齐 830046

收稿日期:2007-10-29 修回日期:2008-02-21 出版日期:2008-07-01 发布日期:2008-07-01
通讯作者: 达吾勒·阿布都哈依尔

Study and implementation of Kazakh lexical scanner

DAWEL Abilhaye,GULILA Altenbek

College of Information Science and Engineering，Xinjiang University，Urumqi 830046，China

Received:2007-10-29 Revised:2008-02-21 Online:2008-07-01 Published:2008-07-01
Contact: DAWEL Abilhaye

摘要/Abstract

摘要： 研究了哈萨克语自动词法分析中的附加成分的切分和词干提取问题，并实现了哈萨克语词法分析系统KazStemmer。系统首先对待切分词使用有限状态自动机进行分析。如果成功则将输出作为切分结果，否则再使用双向全切分和词法分析相结合的改进方法来进行切分。与最大匹配法相比，该方法提高了词干提取的正确率和切分速度。同时，在词干表的搜索中首次采用了改进的逐字母二分词典查询机制来提高了词干提取的效率。

关键词: 附加成分切分, 有限状态自动机, 双向匹配, 全切分

Abstract: This paper studies the problems of stem and affix segmentation in Kazakh automatic morphological analysis and develops a system called “KazStemmer”，which can automatically carry out the stem segmentation and tagging processes for Kazakh corpora.In this paper，the authors first use FSM to analyze the stemming words.IF the FSM does not work，then the combination of the bidirectional matching algorithm，omni-word segmentation algorithm and morphological analysis is used to implement the segmentation of stems and word affixes.Compared to the maximum matching algorithm，this method can get higher precision and processing speed.In addition，the authors use the improved binary-seek-by-character dictionary query mechanism.Its performance also influences the segmentation speed significantly.

Key words: affixes segmentation, FSM, bidirectional matching algorithm, omni-word segmentation algorithm

达吾勒·阿布都哈依尔,古丽拉·阿东别克

. 哈萨克语词法分析器的研究与实现[J]. 计算机工程与应用, 2008, 44(19): 146-149.

DAWEL Abilhaye,GULILA Altenbek. Study and implementation of Kazakh lexical scanner[J]. Computer Engineering and Applications, 2008, 44(19): 146-149.

[1]	周维，陈听海，邱宝鑫. 引入特征重检的抗遮挡目标跟踪方法研究[J]. 计算机工程与应用, 2020, 56(11): 179-184.
[2]	陈矗1，任平红1，禹继国1，马炳先2. 一个完善的基于判定链表的DFA最小化算法[J]. 计算机工程与应用, 2013, 49(6): 48-51.
[3]	早克热·卡德尔1，2，艾山·吾买尔1，2，吐尔根·依布拉音1，2，帕里旦·吐尔逊2，3，吴小川1，2. 混合策略的维吾尔语名词词干提取系统[J]. 计算机工程与应用, 2013, 49(1): 171-175.
[4]	李国和1，2，3，刘光胜1，2，3，秦波波1，2，3，吴卫江1，2，3，李洪奇1，2，3. 综合最大匹配和歧义检测的中文分词粗分方法[J]. 计算机工程与应用, 2012, 48(14): 139-142.
[5]	李斌，舒兰. 确定型格值有限自动机的最小化[J]. 计算机工程与应用, 2010, 46(32): 52-54.
[6]	刘强¹，殷建平¹，程杰仁^1，2，蔡志平¹. 一种新的DDoS攻击预警方法[J]. 计算机工程与应用, 2009, 45(21): 132-135.
[7]	张永胜¹,徐丽丽¹,齐峰²,王强¹. 基于三层组织模型的一种Web服务组合策略[J]. 计算机工程与应用, 2008, 44(22): 89-91.
[8]	李天侠,戴新宇,陈家骏. 基于混合模型的交集型歧义消歧策略[J]. 计算机工程与应用, 2008, 44(21): 5-8.

哈萨克语词法分析器的研究与实现

Study and implementation of Kazakh lexical scanner

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 8

编辑推荐

Metrics