Computer Engineering and Applications ›› 2008, Vol. 44 ›› Issue (12): 175-177.
• 数据库、信号与信息处理 • Previous Articles Next Articles
ZHANG Guo-bing1,2,LI Miao1
Received:
Revised:
Online:
Published:
Contact:
张国兵1,2,李 淼1
通讯作者:
Abstract: This paper presents the concept of local ambiguity word grid.Aiming at the overlay ambiguity in Chinese word segmentation,the article puts forward an algorithm that applies iterative algorithm to train overlay ambiguity dictionary and then a backup lexical item dictionary of overlay ambiguity can be obtained.On this basis,the paper brings in a word segmentation algorithm based on local ambiguity grid which is capable of detecting compounding ambiguity and overlay ambiguity emerging from the process of Chinese word segmentation.This algorithm just calculates a local ambiguity grid instead of the entire ambiguity section and simplifies the processing of overlay ambiguity to just inquiring into the backup dictionary of the related overlay ambiguity,the new approach will help reduce the processing time remarkably.The experiment demonstrates that the algorithm can fulfill the rapidness of segmenting Chinese words and the correctness can reach the level of 97%.
Key words: sentence segmentation, overlay ambiguity, overlapping ambiguity, local ambiguity word grid
摘要: 提出了局部歧义词网格的概念,针对汉语分词中的覆盖歧义,提出了一种使用迭代算法训练覆盖歧义词典的算法,得到覆盖歧义候选词条词典。在此基础上提出了一种基于局部歧义词网格的、能够检测汉语分词过程中产生的组合歧义和覆盖歧义的分词算法,该算法仅考虑存在歧义的局部歧义词网格,并将对覆盖歧义的处理简化为查询覆盖歧义候选词典,因此,该算法的时间复杂度大幅下降。实验结果表明,该算法能够实现快速的汉语分词,且其分词正确率能够达到97%以上。
关键词: 汉语分词, 覆盖歧义, 交叉歧义, 局部歧义词网格
ZHANG Guo-bing1,2,LI Miao1. Rapid word segmentation algorithm based on local ambiguity word grid[J]. Computer Engineering and Applications, 2008, 44(12): 175-177.
张国兵1,2,李 淼1. 一种基于局部歧义词网格的快速分词算法[J]. 计算机工程与应用, 2008, 44(12): 175-177.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/
http://cea.ceaj.org/EN/Y2008/V44/I12/175