Computer Engineering and Applications ›› 2008, Vol. 44 ›› Issue (21): 171-173.DOI: 10.3778/j.issn.1002-8331.2008.21.047

• 机器学习 • Previous Articles     Next Articles

SVM-Adaboost based Chinese text chunking

BIE Zhi1,ZHOU Jun-sheng2,CHEN Jia-jun1   

  1. 1.State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210093,China
    2.Department of Computer,Nanjing Normal University,Nanjing 210097,China
  • Received:2008-04-30 Revised:2008-05-29 Online:2008-07-21 Published:2008-07-21
  • Contact: BIE Zhi

基于SVM-Adaboost的中文组块分析

别 致1,周俊生2,陈家骏1   

  1. 1.南京大学 计算机软件新技术国家重点实验室,南京 210093
    2.南京师范大学 计算机科学系,南京 210097
  • 通讯作者: 别 致

Abstract: Text chunking is a very important approach to preprocessing parsing.It divides text into syntactically related non-overlapping groups of chunks in order to reduce the complexity of the full parsing.In this paper,a SVM-Adaboost algorithm is applied for Chinese text chunking which combines Adaboost with linear-kernel SVM.This algorithm uses SVM as weak learners for AdaBoost and adjusts the kernel parameter of SVM in the learning process.The experimental results show that it is an effective approach.

Key words: Chinese text chunking, Adaboost, support vector machine

摘要: 组块分析是一种非常重要的句法分析预处理手段,通过将文本划分成一组互不重叠的片断,来达到降低句法分析的难度。提出一种基于SVM-Adaboost的中文组块分析方法,将基于线性核函数的支持向量机与Adaboost算法相结合,以基于线性核函数的SVM作为Adaboost的分量分类器,在学习过程中改变分量分类器的核参数。实验结果表明了该算法的有效性。

关键词: 中文组块分析, Adaboost, 支持向量机