Computer Engineering and Applications ›› 2008, Vol. 44 ›› Issue (10): 144-146.

• 数据库、信号与信息处理 • Previous Articles     Next Articles

Chinese word segmentation based on dictionary and statistics

CHEN Ping,LIU Xiao-xia,LI Ya-jun   

  1. Institute of Information Science & Technology,Northwest University,Xi’an 710127,China
  • Received:2007-07-23 Revised:2007-10-18 Online:2008-04-01 Published:2008-04-01
  • Contact: CHEN Ping

基于字典和统计的分词方法

陈 平,刘晓霞,李亚军   

  1. 西北大学 信息科学与技术学院,西安 710127
  • 通讯作者: 陈 平

Abstract: Proposes a method based on dictionary and statistics.The method uses the changed dictionary structure that is able improve efficiency,then uses statistics to deal with the unregistered words left over in the first step,also can resolve most ambiguity.

Key words: word segmentation based on dictionary, word segmentation based on statistical method, crossing ambiguities, unregistered

摘要: 提出了一种基于字典与统计相结合的中文分词方法,该方法利用改进的字典结构能够快速切分,在其基础上进一步利用统计的方法处理所产生未登录词,并且能解决大部分交集歧义问题。

关键词: 基于字典的分词, 基于统计的分词, 交叉歧义, 未登录词