计算机工程与应用 ›› 2010, Vol. 46 ›› Issue (3): 125-127.DOI: 10.3778/j.issn.1002-8331.2010.03.037

• 数据库、信号与信息处理 • 上一篇    下一篇

汉语文本自动分词算法的研究

何国斌,赵晶璐   

  1. 西南大学 计算机与信息科学学院,重庆 400715
  • 收稿日期:2008-08-05 修回日期:2008-10-28 出版日期:2010-01-21 发布日期:2010-01-21
  • 通讯作者: 何国斌

Research on algorithm of Chinese word automatic segmentation

HE Guo-bin,ZHAO Jing-lu   

  1. College of Computer and Information Science,Southwest University,Chongqing 400715,China
  • Received:2008-08-05 Revised:2008-10-28 Online:2010-01-21 Published:2010-01-21
  • Contact: HE Guo-bin

摘要: 分析了中文分词词典的机制,提出了一种改进的整词分词字典结构,并针对机械分词算法的特点,将其与概率算法相结合,探讨了一种中文自动分词概率算法。采用哈希及二分法对词典进行分词匹配。实验表明,该算法具有较高的分词效率和准确率,对于消去歧义词也有较好的性能。

关键词: 自动分词, 分词算法, 字典, 歧义切分

Abstract: Chinese segmentation mechanism is analyzed.An improved structure of segmentation dictionary is presented,and in view of the characteristics of the mechanical Chinese word segmentation,combined with probabilistic algorithm,a Chinese Word Automatic Segmentation probabilistic algorithm is discussed.Hashing and binary search is used to segmentation match.Experiment indicates that the algorithm can greatly improve the speed of Chinese segmentation and precision,and strengthen the processing of dispelling ambiguity.

Key words: automatic segmentation, segmentation algorithm, dictionary, ambiguity segmentation

中图分类号: