计算机工程与应用 ›› 2006, Vol. 42 ›› Issue (28): 9-.

• 博士论坛 • 上一篇    下一篇

基于弱指导SVM的汉语动词次范畴化自动获取

韩习武、赵铁军

  

  1. 哈尔滨工业大学计算机学院
  • 收稿日期:2006-05-26 修回日期:1900-01-01 出版日期:2006-10-01 发布日期:2006-10-01
  • 通讯作者: 韩习武 frank6196

Subcategorization Acquisition Based on Weakly Supervised SVM for Chinese Verbs

,   

  1. 哈尔滨工业大学计算机学院
  • Received:2006-05-26 Revised:1900-01-01 Online:2006-10-01 Published:2006-10-01

摘要: 动词次范畴化自动获取过程主要涉及到两个典型步骤:一、依据启发性规则生成次范畴化假设;二、应用统计方法对假设集合进行过滤,选择可靠的次范畴化类型。此前改进获取性能的研究都集中在统计过滤阶段,并且相关实验的假设生成阶段都没有涉及到有指导的训练过程,因此所有这些方法都是无指导的。本文提出一种弱指导的汉语动词次范畴化自动获取方案,并应用SVM分类器取代原系统中的无指导假设生成模块。实验结果表明,最终获取性能有了统计意义上的改善。

关键词: 汉语动词, 次范畴化, 弱指导, SVM

Abstract: Procedure of subcategorization acquisition mainly includes two typical steps: 1. Subcategorization hypotheses are generated according to certain heuristic rules; 2. Hypotheses are filtered via statistical methods and reliable subcategorization types are selected. Previous efforts to improve the acquisition performance are focused on statistical filtering, and there is no supervised training for the generation of hypotheses in relevant experiments. Therefore, all these methods are unsupervised. This paper proposes a weakly supervised method for Chinese subcategorization acquisition, where the unsupervised hypothesis generator is replaced with an SVM classifier. Results of experiments indicate statistically significant improvement in the general acquisition performance.

Key words: Chinese verbs, subcategorization, weakly supervised, SVM