计算机工程与应用 ›› 2010, Vol. 46 ›› Issue (9): 134-137.DOI: 10.3778/j.issn.1002-8331.2010.09.038

• 数据库、信号与信息处理 • 上一篇    下一篇

语义分析中谓词标识的特征工程

汪红林1,2,王红玲1,2,周国栋1,2   

  1. 1.苏州大学 计算机科学与技术学院,江苏 苏州 215006
    2.江苏省计算机信息处理技术重点实验室,江苏 苏州 215006
  • 收稿日期:2008-09-26 修回日期:2009-01-17 出版日期:2010-03-21 发布日期:2010-03-21
  • 通讯作者: 汪红林

Feature engineering for predicate identification and classification in semantic analysis

WANG Hong-lin1,2,WANG Hong-ling1,2,ZHOU Guo-dong1,2   

  1. 1.School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China
    2.Jiangsu Provincial Key Laboratory of Computer Information Processing Technology,Suzhou,Jiangsu 215006,China
  • Received:2008-09-26 Revised:2009-01-17 Online:2010-03-21 Published:2010-03-21
  • Contact: WANG Hong-lin

摘要: 谓词是句子中的最重要的成分,它的正确与否对语义分析的影响非常大。而众多的特征直接影响到谓词标识的性能,如何组织这些特征显得尤为重要。选取了7个基本特征和30多个新特征以及它们的组合,使用最大熵分类器,在基本特征的基础上通过增加有利特征的方法,使得谓词标注的F1值增长了约5%(由84.7%增加到89.8%),词义识别的F1值增长了约2%(由80.3%增加到82.1%),结果表明,这些新特征及其组合大大提高了性能。

关键词: 谓词标注和词义识别, 语义分析, 特征工程, 最大熵分类器

Abstract: Predicate is the most important component in a sentence,which greatly influences the identification of the semantic analysis.The performance of predicate identification and classification relies on lots of features,but how to combine those features is more important.This paper picks out 7 basic features and over 30 new features with different combinations.By adding useful combinations of the features into the baseline system with the maximum entropy classifier,it improves by 5% of F1-score(from 84.7% up to 89.8%) on predicate identification and also gains about 2% increase of F1-score(from 80.3% up to 82.1%) on predicate classification.It shows that those new features and the combination of them can much improve the performance of the system.

Key words: predicate identification and predicate classification, semantic analysis, feature engineering, maximum entropy classifier

中图分类号: