Computer Engineering and Applications ›› 2009, Vol. 45 ›› Issue (29): 128-130.DOI: 10.3778/j.issn.1002-8331.2009.29.038

• 数据库、信号与信息处理 • Previous Articles     Next Articles

Double-array Trie based on genetic algorithm and idea of Sherwood

WANG Shi-kun,LI Shao-zi,KE Xiao   

  1. Department of Cognitive Science,Xiamen University,Xiamen,Fujian 361005,China
  • Received:2008-06-02 Revised:2008-08-18 Online:2009-10-11 Published:2009-10-11
  • Contact: WANG Shi-kun

基于遗传算法和舍伍德思想的双数组Trie树改进

王世昆,李绍滋,柯 逍   

  1. 厦门大学 智能科学与技术系,福建 厦门 361005
  • 通讯作者: 王世昆

Abstract: In the Chinese information processing Chinese dictionary is enquired.When involved in a large scale of dictionary,how fast the visit to obtain knowledge of words will become a need to focus on resolving problems.This paper will outline a simple and efficient mechanism which is based on double-array Trie principle for the dictionary.For enquiries about the time of the algorithm is O(n)of linear time(n is the length of term enquiries).This shows that there is a clear advantage in double-array Trie.But there is a serious waste in storage of double-array Trie.Predecessors put forward some solutions.One of the major:Using a heuristic sort strategy.That is,each time the active node is dealt with first which has the largest branch nodes.Considering that such solution is a heuristic algorithm for deterministic algorithm,it will be easy to catch the trap of local optimal solution.On the basis of that kind of mentality,this paper introduces the idea of Sherwood random thoughts and mutation of genetic algorithms to improve the performance of double-array Trie.

Key words: Double-Array Trie(DAT), Sherwood algorithms, genetic algorithms, mutation

摘要: 对汉语信息处理中常常要涉及汉语词典查询,当所涉及的词典规模较为庞大时如何快速访问词典以获取词语知识便成为了一个需重点解决的问题。将阐述一种简单快捷的基于双数组Trie(Double-Array Trie)原理的词典查询机制。该算法的查询时间为O(n)的线性时间(n为查询词条的长度),由此可见双数组算法在时间上存在着明显优势,但在空间耗费上却存在着浪费现象。前人提出了一些解决方案,其中主要的有:在构造双数组时采用一种启发式排序策略,即每一次都先处理当前分支节点最多的活动节点。考虑到这种启发式思想为确定性算法,容易陷入局部最优陷阱之中,因此在这种思想的基础上引入了舍伍德随机思想和遗传算法中常常运用到的变异思想,在改进算法空间利用率的同时也使得算法跳出了局部最优解的陷阱。

关键词: 双数组索引, 舍伍德随机思想, 遗传算法, 变异

CLC Number: