Computer Engineering and Applications ›› 2010, Vol. 46 ›› Issue (26): 102-104.DOI: 10.3778/j.issn.1002-8331.2010.26.032

• 网络、通信、安全 • Previous Articles     Next Articles

New text categorization model based on cascade neural network and SVD

WANG Yan-xia,DENG Wei   

  1. School of Computer Science & Technology,Soochow University,Suzhou,Jiangsu 215006,China
  • Received:2009-02-27 Revised:2009-04-07 Online:2010-09-11 Published:2010-09-11
  • Contact: WANG Yan-xia

基于级连神经网络和SVD的文本分类新模型

王燕霞,邓 伟   

  1. 苏州大学 计算机科学与技术学院,江苏 苏州 215006
  • 通讯作者: 王燕霞

Abstract: A new text categorization model based on cascade neural network and Singular Value Decomposition(SVD) is proposed.The neural network is trained by the cascade-correlation algorithm.Most classic classification systems represent the contents of documents with a set of index terms,it has been known as Vector Space Model(VSM).However,this method needs a high dimensional space to represent the documents,and it does not take into account the semantic relationship between terms,which can lead to poor classification performance.In this paper,SVD is used to learn and represent relations among very large numbers of words and very large numbers of natural text passages in which they occurred.It can not only greatly reduce the dimensional but also discover the important associative relationships between terms.The experiments show that it also helps to accelerate the training speed and improves the classification accuracy.

Key words: Singular Value Decomposition(SVD), neural network, text categorization, BP algorithm, cascade-correlation algorithm

摘要: 提出了一个基于级连神经网络(Cascade-Correlation Neural Network,CCNN)和SVD(Singular Value Decomposition)的文本分类新模型。该神经网络用级连相关算法来训练网络。大部分的文本分类系统用向量空间模型(Vector Space Model,VSM)来表现文档,然而这种方法需要很高的维度,并且考虑不到文本特征词间的语义隐含信息,因此分类效果不是太理想。引入SVD来学习和表现文本特征词,在降低特征维度的基础上,将文本特征的隐含信息表现出来。实验证明,在加快训练速度的基础上,提高了分类的精度。

关键词: 奇异值分解, 神经网络, 文本分类, BP算法, 级联相关算法

CLC Number: