计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (21): 199-201.DOI: 10.3778/j.issn.1002-8331.2008.21.054

• 机器学习 • 上一篇    下一篇

面向中文自动分词的可扩展式电子词典研究

贺 胜1,曲维光2,许 超1   

  1. 1.南京师范大学 文学院,南京 210097
    2.南京师范大学 计算机科学系,南京 210097
  • 收稿日期:2008-04-30 修回日期:2008-05-20 出版日期:2008-07-21 发布日期:2008-07-21
  • 通讯作者: 贺 胜

Extendable digital dictionary for automatic Chinese word segmentation

HE Sheng1,QU Wei-guang2,XU Chao1   

  1. 1.School of Chinese Language and Literature,Nanjing Normal University,Nanjing 210097,China
    2.Deptartment of Computer Science,Nanjing Normal University,Nanjing 210097,China
  • Received:2008-04-30 Revised:2008-05-20 Online:2008-07-21 Published:2008-07-21
  • Contact: HE Sheng

摘要: 在中文自动分词及词性标注系统中,电子词典是系统的重要组成部分,也是影响系统性能的重要因素之一。介绍了电子词典应该具备的查询功能及常用的组织结构,给出了一种结构为系统词典+用户词典的可扩展式电子词典机制。其系统词典是基于首字Hash散列的逐字二分词典结构,用户词典采用基于首字Hash散列的链接表词典结构,具有很强的扩展性和实用性。

关键词: 电子词典, 词典结构, 自动分词, Hash

Abstract: Digital dictionary is an important part in automatic Chinese word segmentation and part of speech tagging,which is also a vital factor affecting system performance.This thesis introduces the necessary searching functions and common components for a digital dictionary and proposes an extendable mechanism which consists of system dictionary and user dictionary.The system dictionary is indexed with initial character hash table characterized with character-based binary tree structure.The user’s dictionary is also indexed with initial character hash table but augmented with linking structure.Experiment shows that the system is extendable in practice.

Key words: digital dictionary, dictionary structure, automatic word segmentation, hash