基于字典和统计的分词方法

计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (10): 144-146.

• 数据库、信号与信息处理 • 上一篇下一篇

基于字典和统计的分词方法

陈平,刘晓霞,李亚军

西北大学信息科学与技术学院，西安 710127

收稿日期:2007-07-23 修回日期:2007-10-18 出版日期:2008-04-01 发布日期:2008-04-01
通讯作者: 陈平

Chinese word segmentation based on dictionary and statistics

CHEN Ping,LIU Xiao-xia,LI Ya-jun

Institute of Information Science & Technology，Northwest University，Xi’an 710127，China

Received:2007-07-23 Revised:2007-10-18 Online:2008-04-01 Published:2008-04-01
Contact: CHEN Ping

摘要/Abstract

摘要： 提出了一种基于字典与统计相结合的中文分词方法，该方法利用改进的字典结构能够快速切分，在其基础上进一步利用统计的方法处理所产生未登录词，并且能解决大部分交集歧义问题。

关键词: 基于字典的分词, 基于统计的分词, 交叉歧义, 未登录词

Abstract: Proposes a method based on dictionary and statistics.The method uses the changed dictionary structure that is able improve efficiency，then uses statistics to deal with the unregistered words left over in the first step，also can resolve most ambiguity.

Key words: word segmentation based on dictionary, word segmentation based on statistical method, crossing ambiguities, unregistered

陈平,刘晓霞,李亚军. 基于字典和统计的分词方法[J]. 计算机工程与应用, 2008, 44(10): 144-146.

CHEN Ping,LIU Xiao-xia,LI Ya-jun. Chinese word segmentation based on dictionary and statistics[J]. Computer Engineering and Applications, 2008, 44(10): 144-146.

[1]	徐戈，杨晓燕，汪涛. 单词语义相似性计算综述[J]. 计算机工程与应用, 2020, 56(4): 9-15.
[2]	张国兵^1,2,李淼¹. 一种基于局部歧义词网格的快速分词算法[J]. 计算机工程与应用, 2008, 44(12): 175-177.
[3]	余祖波¹,高庆狮^1,2,马建军¹. 基于多级阈值的中文人名识别[J]. 计算机工程与应用, 2007, 43(33): 1-3.

基于字典和统计的分词方法

Chinese word segmentation based on dictionary and statistics

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 3

编辑推荐

Metrics