计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (10): 144-146.

• 数据库、信号与信息处理 • 上一篇    下一篇

基于字典和统计的分词方法

陈 平,刘晓霞,李亚军   

  1. 西北大学 信息科学与技术学院,西安 710127
  • 收稿日期:2007-07-23 修回日期:2007-10-18 出版日期:2008-04-01 发布日期:2008-04-01
  • 通讯作者: 陈 平

Chinese word segmentation based on dictionary and statistics

CHEN Ping,LIU Xiao-xia,LI Ya-jun   

  1. Institute of Information Science & Technology,Northwest University,Xi’an 710127,China
  • Received:2007-07-23 Revised:2007-10-18 Online:2008-04-01 Published:2008-04-01
  • Contact: CHEN Ping

摘要: 提出了一种基于字典与统计相结合的中文分词方法,该方法利用改进的字典结构能够快速切分,在其基础上进一步利用统计的方法处理所产生未登录词,并且能解决大部分交集歧义问题。

关键词: 基于字典的分词, 基于统计的分词, 交叉歧义, 未登录词

Abstract: Proposes a method based on dictionary and statistics.The method uses the changed dictionary structure that is able improve efficiency,then uses statistics to deal with the unregistered words left over in the first step,also can resolve most ambiguity.

Key words: word segmentation based on dictionary, word segmentation based on statistical method, crossing ambiguities, unregistered