Computer Engineering and Applications ›› 2008, Vol. 44 ›› Issue (32): 137-139.DOI: 10.3778/j.issn.1002-8331.2008.32.041

• 数据库、信号与信息处理 • Previous Articles     Next Articles

Chinese base phrase parsing based on maximum entropy model

ZHU Chong1,WANG Da-wei2,3,ZHANG Xiang-li1   

  1. 1.Information & Communication College,Guilin University of Electronic Technology,Guilin,Guangxi 541004,China
    2.Department of Automation,University of Science and Technology of China,Hefei 230026,China
    3.Institute of Intelligent Machines,Chinese Academy of Sciences,Hefei 230031,China
  • Received:2007-12-11 Revised:2008-02-27 Online:2008-11-11 Published:2008-11-11
  • Contact: ZHU Chong

基于最大熵方法汉语基本短语分析

朱 冲1,王大为2,3,张向利1   

  1. 1.桂林电子科技大学 信息与通信学院,广西 桂林 541004
    2.中国科学技术大学 自动化系,合肥 230026
    3.中国科学院 合肥智能机械研究所,合肥 230031
  • 通讯作者: 朱 冲

Abstract: This paper presents a basic Chinese phrase parsing model,which separates the prediction of the phrase boundary location and tagging,a maximum entropy method is adopted to solve the model,respectively.The focus of ME model is how to select useful features.The procedure and algorithms of feature selection with feature space are given.Experimental results demonstrate that the precision for predicting the phrase boundary is 95.27%,and the precision of phrase tagging is 96.2%.

Key words: phrase parsing, latent syntax, maximum entropy principle

摘要: 提出了一个汉语基本短语分析模型,将汉语短语的边界划分和短语标识分开,假定这两个过程相互独立,采用最大熵方法分别建立模型解决。最大熵模型的关键是如何选取有效的特征,文中给出了两个步骤相关的特征空间以及特征选择过程和算法。实验表明,模型的短语定界精确率达到95.27%,标注精确率达到96.2%。

关键词: 短语分析, 潜层句法, 最大熵原理