计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (15): 97-100.

• 大数据与云计算 • 上一篇    下一篇

基于条件随机场的中文领域分词研究

朱艳辉,刘  璟,徐叶强,田海龙,马  进   

  1. 湖南工业大学 计算机与通信学院,湖南 株洲 412007
  • 出版日期:2016-08-01 发布日期:2016-08-12

Chinese word segmentation research based on Conditional Random Field

ZHU Yanhui, LIU Jing, XU Yeqiang, TIAN Hailong, MA Jin   

  1. School of Computer and Communication, Hunan University of Technology, Zhuzhou, Hunan 412007, China
  • Online:2016-08-01 Published:2016-08-12

摘要: 针对条件随机场分词不具有良好的领域自适应性,提出一种条件随机场与领域词典相结合的方法提高领域自适应性,并根据构词规则提出了固定词串消解,动词消解,词概率消解三种方法消除歧义。实验结果表明,该分词流程和方法,提高了分词的准确率和自适应性,在计算机领域和医学领域的分词结果F值分别提升了7.6%和8.7%。

关键词: 中文分词, 条件随机场, 领域自适应, 歧义消解, 领域分词, 逆向最大匹配算法

Abstract: According to the Conditional Random Field for Chinese word segmentation, the field is hard to adaptive. A combination of CRF and domain dictionary is proposed to improve the field adaptability, and for eliminating ambiguity, this paper uses fixed word collocation, verb dictionary and word probability by the rule of word formation. The experiental results show that this approach improves the accuracy and adaptability of the word segmentation. F value of the segmentation results in computer and medical fields is increased by 7.6% and 8.7%.

Key words: Chinese word segmentation, Conditional Random Field(CRF), domain adaption, ambiguity resolution, domain segmentation, reverse directional maximum match method