计算机工程与应用 ›› 2009, Vol. 45 ›› Issue (22): 123-125.DOI: 10.3778/j.issn.1002-8331.2009.22.040

• 数据库、信息处理 • 上一篇    下一篇

运用有向图进行中文分词研究

张培颖   

  1. 中国石油大学(华东) 计算机与通信工程学院,山东 东营 257061
  • 收稿日期:2008-04-30 修回日期:2008-07-23 出版日期:2009-08-01 发布日期:2009-08-01
  • 通讯作者: 张培颖

Method of Chinese word segmentation based on directed graph

ZHANG Pei-ying   

  1. College of Computer & Communication Engineering,University of Petroleum (East China),Dongying,Shandong 257061,China
  • Received:2008-04-30 Revised:2008-07-23 Online:2009-08-01 Published:2009-08-01
  • Contact: ZHANG Pei-ying

摘要: 首先说明了分词在中文信息处理中的作用,然后介绍了分词系统中的关键技术。提出了一种基于有向图的中文分词算法,该算法首先构造中文分词有向图,然后计算中文分词有向图中所有可能的切分路径,最后利用了最少分词原则、汉字之间的互信息和词语的频率等信息给中文分词有向图中的每条切分路径打分,分数最高的路径就对应正确的切分结果。开放测试结果表明分词精确率可达90%以上。

关键词: 中文分词, 有向图, 中文分词有向图, 切分路径, 互信息

Abstract: Chinese word segmentation is the first step for any Chinese information processing and hinders seriously its development.This paper introduces the critical technologies in the segmentation systems.It proposes a refinement of the segmentation algorithm based on the directed graph,this algorithm first constructs the Chinese segmentation directed graph,and calculates the weight of every segmentation path,last evaluates every segmentation path based on the principle of least segmentation,the mutual info of characters and the frequency of words,the highest scores on the path corresponding the correct segmentation results.Open-Test results show that the accuracy rate is more than 90%.

Key words: Chinese segmentation, directed graph, Chinese segmentation directed graph, segmentation path, mutual information