Computer Engineering and Applications ›› 2008, Vol. 44 ›› Issue (33): 144-147.DOI: 10.3778/j.issn.1002-8331.2008.33.045

• 数据库、信号与信息处理 • Previous Articles     Next Articles

Feature selection using syntactic and semantic information in question classification

YUAN Xiao-jie,SHI Jian-xing,NING Hua,YU Shi-tao   

  1. College of Information Technical Science,Nankai University,Tianjin 300071,China
  • Received:2007-12-17 Revised:2008-03-07 Online:2008-11-21 Published:2008-11-21
  • Contact: YUAN Xiao-jie

问题分类中基于句法和语义信息的特征选择

袁晓洁,师建兴,宁 华,于士涛   

  1. 南开大学 信息技术科学学院,天津 300071
  • 通讯作者: 袁晓洁

Abstract: Question classification is a very important sub-module of question answering system,and the key lies in the feature selection.This paper proposes a new feature selection method based on syntactic and semantic information,using the question word,the main verb of the question,the dependency structure,the main noun and the top hypernym of the noun as features for classification.Evaluate the effect of feature selection using KNN and Na?觙ve Bayes classifiers,and attain an expected result.In the predefined question taxonomy,the classification accurate reaches 82.2% and 83.7% respectively.It is better than the method using bag-of-words features.

Key words: question answering system, question classification, feature selection, dependency structure, hypernym

摘要: 问题分类是问答系统中一个非常重要的子模块,其关键在于问题的特征选择。考虑了问题的句法信息和语义信息,提出了一种利用问题疑问词、依存关系、主要动词、中心名词和名词的最高上位词作为特征进行分类的新方法。实验中,采用k-最邻近和朴素贝叶斯两种分类算法对该方法进行测试,结果表明了该方法具有较好的分类效果。在自定义的分类体系上,分别达到了82.2%和83.7%的分类精度,性能高于基于bag-of-words的特征选择方法。

关键词: 问答系统, 问题分类, 特征选择, 依存关系, 上位词