计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (14): 95-99.DOI: 10.3778/j.issn.1002-8331.1705-0170

• 大数据与云计算 • 上一篇    下一篇

基于PSO-ICA的文本分类研究

仇国庆,张少昀,赵婉滢,马  俊   

  1. 重庆邮电大学 自动化学院,重庆 400065
  • 出版日期:2018-07-15 发布日期:2018-08-06

Text classification based on PSO-ICA

QIU Guoqing, ZHANG Shaoyun, ZHAO Wanying, MA Jun   

  1. College of Automation, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
  • Online:2018-07-15 Published:2018-08-06

摘要: 文本分类在采用向量空间模型(VSM)表达文本特征时,容易出现特征向量高维且稀疏的现象,为了对原始的文本特征向量进行有效简化,提出了一种基于粒子群(PSO)优化独立分量分析(ICA)进行降维的方法,并将其运用到文本分类中。在该算法中,以负熵作为粒子群算法的适应度函数,依据其高斯性原理作为独立性判别标准对分离矩阵进行自适应更新。实验结果表明,相比于传统的特征降维方法,该方法可以解决高维度文本特征向量降维困难的问题,使得文本分类的效率、准确率显著提升。

关键词: 粒子群算法, 独立分量分析, 特征降维, 文本分类

Abstract: Text classification often utilizes Vector Space Model(VSM) to express text features, which is prone to high-dimensional and sparse eigenvectors. Aiming at simplifying the original text feature vectors effectively, a dimensionality reduction algorithm based on Independent Component Analysis(ICA) optimized by Particle Swarm Optimization(PSO) is proposed and applied to text classification. The negative entropy is adopted as the fitness function of the particle swarm optimization and its Gaussian principle is regarded as the discriminant standard of independence to adaptively update the separation matrix. Experimental results demonstrate the proposed method can address the dimensionality reduction problem of the high-dimensional text feature vector and significantly improve the efficiency and accuracy of the text classification compared to the conventional dimensionality reduction methods.

Key words: particle swarm optimization, independent component analysis, dimension reduction, text classification