Computer Engineering and Applications ›› 2009, Vol. 45 ›› Issue (2): 147-149.DOI: 10.3778/j.issn.1002-8331.2009.02.043

• 数据库、信号与信息处理 • Previous Articles     Next Articles

Web information extraction based on Transductive Support Vector Machine

XIAO Jian-peng,ZHANG Lai-shun,REN Xing   

  1. Institute of Electronic Technology,the PLA Information Engineering University,Zhengzhou 450004,China
  • Received:2007-12-27 Revised:2008-03-17 Online:2009-01-11 Published:2009-01-11
  • Contact: XIAO Jian-peng

直推式支持向量机在Web信息抽取中的应用研究

肖建鹏,张来顺,任 星   

  1. 中国人民解放军信息工程大学 电子技术学院,郑州 450004

  • 通讯作者: 肖建鹏

Abstract: Transductive Support Vector Machines(TSVM) classify the new data vector based on the information only related to this data vector.This paper proposes a Web information extraction method based on TSVM and extract Web information with the classify angle.It needs far less tagged samples to carry out classify mark a lot of untagged samples and complete Web information extraction by classified way.The results show that TSVM can be used in Web information extraction.

Key words: Web information extraction, classification learning, Transductive Support Vector Machine(TSVM)

摘要: 直推式支持向量机是一种直接从已知样本出发对特定的未知样本进行识别的分类技术。在分析直推式支持向量机分类原理的基础上,提出一种基于直推式支持向量机的Web信息抽取方法,直接从分类的角度抽取Web信息。只需要提供少量标记样本就可以实现对大量未标注样本的分类标注,从而以分类的方式完成Web数据抽取任务。实验结果表明,使用这种方法进行Web信息抽取是有效性。

关键词: Web信息抽取, 分类学习, 直推式支持向量机