计算机工程与应用 ›› 2007, Vol. 43 ›› Issue (11): 175-178.

• 数据库与信息处理 • 上一篇    下一篇

一种有效解决汉语歧义切分的方法

朱鉴 张建 李淼   

  1. 中国科学院合肥智能机械研究所 中国科学院合肥智能机械研究所 中国科学院合肥智能机械研究所
  • 收稿日期:2006-05-12 修回日期:1900-01-01 出版日期:2007-04-11 发布日期:2007-04-11
  • 通讯作者: 张建

An Effective Method on Resolve Chinese Ambiguous Segmentation

  • Received:2006-05-12 Revised:1900-01-01 Online:2007-04-11 Published:2007-04-11

摘要: 本文提出了一种通过有向图和统计加规则的多层过滤方法来有效解决汉语分词过程中的交集型歧义切分问题,该方法大大提高了切分的正确率。经过六万五千字的开放语料测试,我们统计了其对交集型歧义字段的切分结果,发现该方法对交集型歧义字段的切分正确率为98.43%,以上数据表明该方法在解决汉语交集型歧义字段的问题时是行之有效的。

关键词: 有向图, 统计模型, 规则库, 歧义字段, 汉字切分

Abstract: This paper presents a method that is based on directed graph plus statistic-based and rule-based means, this method effectively resolves the Chinese overlapped ambiguous segmentation. In an open test of a Chinese corpus with 65,000 characters, the accuracy of segmentation for ambiguous phrases of overlapped type reached 98.73%, this number proves that this method is very effective on resolve Chinese overlapped ambiguous segmentation.

Key words: directed graph, statistical model, rule library, ambiguous phrase, Chinese word segmentation