Computer Engineering and Applications ›› 2008, Vol. 44 ›› Issue (29): 150-152.DOI: 10.3778/j.issn.1002-8331.2008.29.042

• 数据库、信号与信息处理 • Previous Articles     Next Articles

Research of text categorization based on CCIPCA and ICA

HE Hai-bin,LI Xin-fu,ZHAO Lei-lei   

  1. Faculty of Mathematics and Computer Science,Hebei University,Baoding,Hebei 071002,China
  • Received:2008-04-23 Revised:2008-07-21 Online:2008-10-11 Published:2008-10-11
  • Contact: HE Hai-bin

基于CCIPCA和ICA降维的文本分类研究

何海斌,李新福,赵蕾蕾   

  1. 河北大学 数学与计算机学院,河北 保定 071002
  • 通讯作者: 何海斌

Abstract: Vector Space Model is usually used to express text for text categorization.The process of dimension reduction is a very key problem for practical text categorization.The classical decomposition algorithms are incapable of solving these problems with high-dimensional and large-scale.In this paper an approach to reduce dimensionality of feature space is presented by using candid incremental principal component analysis and independent component analysis algorithm.The experimental result shows that the proposed method for dimension reduction is feasible and effective.

Key words: text categorization, dimension reduction, Candid Covariance-free Incremental Principal Component Analysis(CCIPCA), Independent Components Analysis(ICA), Support Vector Machine(SVM)

摘要: 文本分类中采用向量空间模型来表达文本特征,维数巨大,关键是对高维的特征集进行降维处理,而一般的分解算法无法处理大规模的高维问题。采用CCIPCA与ICA相结合的特征提取方法可以有效地实现文本特征降维。实验结果表明降维提高了分类器的效率和效果。

关键词: 文本分类, 特征降维, 直观无协方差增量主元分析, 独立成分分析, 支持向量机