Computer Engineering and Applications ›› 2009, Vol. 45 ›› Issue (26): 118-120.DOI: 10.3778/j.issn.1002-8331.2009.26.035

• 数据库、信息处理 • Previous Articles     Next Articles

Semantic chunks segmentation based on maximum entropy model

XIE Fa-kui1,2,ZHANG Quan2   

  1. 1.Graduate University of Chinese Academy of Sciences,Beijing 100039,China
    2.Institute of Acoustics,Chinese Academy of Sciences,Beijing 100190,China
  • Received:2008-05-19 Revised:2008-08-12 Online:2009-09-11 Published:2009-09-11
  • Contact: XIE Fa-kui

基于最大熵模型的语义块切分

谢法奎1,2,张 全2   

  1. 1.中国科学院 研究生院,北京 100039
    2.中国科学院 声学研究所,北京 100190
  • 通讯作者: 谢法奎

Abstract: Semantic chunks segmentaion is an important task in the Hierarchical Network of Concepts(HNC) theory.To deal with this problem,this paper adopts a new method based on statistical modeling.And forms some feature templates with word,POS,concept,and selects features by a incremental way.Finally,construct a semantic chunks segmentation system based on a maximum entropy model.The experiment is taken on HNC corpus,and the result shows that the model works well,the open test precision and recall are 83.78% and 91.17% respectively.

Key words: maximum entropy model, sematic chunk, Hierarchical Network Concepts(HNC)

摘要: 语义块切分是HNC理论的重要课题,与以往的处理策略不同,采用统计建模的方法来解决这一问题。采用词语、词性、概念等信息组成特征模板,并应用增量方法进行特征选择,构建了一个基于最大熵模型的语义块切分系统。在HNC标注语料库上的测试取得了较好的效果,开放测试的正确率和召回率分别达到了83.78%和91.17%。

关键词: 最大熵模型, 语义块, 概念层次网络

CLC Number: