计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (32): 123-127.

• 数据库、信号与信息处理 • 上一篇    下一篇

面向Web文档分类的张量最大间隔投影

王自强1,2,孙 霞2,钱 旭1   

  1. 1.中国矿业大学(北京) 机电与信息工程学院,北京 100083
    2.河南工业大学 信息科学与工程学院,郑州 450001
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-11-11 发布日期:2011-11-11

Tensor maximum margin projection for Web document classification

WANG Ziqiang1,2,SUN Xia2,QIAN Xu1   

  1. 1.College of Mechanical Electronic and Information Engineering,China University of Mining and Technology(Beijing),Beijing 100083,China
    2.School of Information Science and Engineering,Henan University of Technology,Zhengzhou 450001,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-11-11 Published:2011-11-11

摘要: 为了有效地解决传统的基于向量表示的文档维数降维算法存在的维数灾难和奇异值问题,提出了基于张量最大间隔投影的Web文档分类算法,该算法能够在维数降维的过程中充分利用文档的结构和关联信息来提高算法的分类鉴别能力,在WebKB和20NG数据集上的实验结果表明该算法优于其他常用的的文档分类算法。

关键词: 文档分类, 最大间隔投影, 数据挖掘, 流形学习

Abstract: To effectively resolve the following drawbacks of the traditional vector representation-based document dimensionality reduction algorithms,such as the curse of dimensionality and singular value problems,a novel Web document classification algorithm based on tensor maximum margin projection is proposed in this paper.This algorithm can improve the discriminant ability of dimensionality reduction algorithm by fully using the structure and association information of documents.The experimental results on the WebKB and 20NG document collections show that the proposed algorithm achieves much better performance than other conventional document classification algorithms.

Key words: document classification, maximum margin projection, data mining, manifold learning