计算机工程与应用 ›› 2010, Vol. 46 ›› Issue (35): 129-132.DOI: 10.3778/j.issn.1002-8331.2010.35.037

• 数据库、信号与信息处理 • 上一篇    下一篇

面向文本分类的矩阵投影算法

钟 将,孙启干,李 静   

  1. 重庆大学 计算机学院,重庆 400044
  • 收稿日期:2010-06-04 修回日期:2010-08-20 出版日期:2010-12-11 发布日期:2010-12-11
  • 通讯作者: 钟 将

Matrix projection algorithm for text classification

ZHONG Jiang,SUN Qi-gan,LI Jing   

  1. College of Computer Science,Chongqing University,Chongqing 400044,China
  • Received:2010-06-04 Revised:2010-08-20 Online:2010-12-11 Published:2010-12-11
  • Contact: ZHONG Jiang

摘要: 对文本分类中降维技术、提高分类精度和效率的方法进行了研究,提出了一种基于矩阵投影运算的新型文本分类算法——Matrix Projection(MP)分类算法。矩阵运算将训练样例中表示文本特征的三维空间投影到二维空间上,得到归一化向量,有效地达到了降维与精确计算特征项权重的目的。与其他多种文本分类算法对比实验表明,MP算法的分类精度和时间性能都有明显提高,在两套数据集上的宏平均F1值分别达到92.29%和96.03%。

关键词: 文本分类, 向量空间模型, 矩阵投影, 特征选择

Abstract: A new algorithm,namely matrix projection algorithm,is proposed for text classification to solve the key problems of reducing dimension of features and improving efficiency and accuracy.It is based on matrix operation,which projects three-dimensional feature space of training samples onto two-dimensional feature space and obtains a normalized feature vector,achieves the aims of reduction in feature dimensions and accurate computation of feature term weights.Comparing with several typical algorithms,the proposed algorithm is remarkably superior to them in terms of accuracy and time,and the F1 value reaches 92.29% and 96.03% respectively on two typical data sets.

Key words: text classification, vector space model, matrix projection, feature selection

中图分类号: