计算机工程与应用 ›› 2014, Vol. 50 ›› Issue (24): 133-138.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

使用LSA降维的改进ART2神经网络文本聚类

徐晨凯,高茂庭   

  1. 上海海事大学 信息工程学院,上海 201306
  • 出版日期:2014-12-15 发布日期:2014-12-12

Improved ART2 neural network for text clustering based on LSA

XU Chenkai, GAO Maoting   

  1. College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
  • Online:2014-12-15 Published:2014-12-12

摘要: 针对文本数据高维度的特点和聚类的动态性要求,结合隐含语义分析(LSA)降维,提出一种改进的ART2神经网络文本聚类算法,通过LSA凸显文本和词条之间的语义关系,减少无用噪声,降低数据维度和计算复杂性;采用改进的折中学习方法,减少计算步骤,加快ART2神经网络计算速度,并利用最近邻动态重组方法提高ART2网络聚类的稳定性,减弱算法对样本输入顺序的依赖。实验表明,改进的文本聚类算法能有效地实现动态文本聚类。

关键词: ART2神经网络, 最近邻, 隐含语义分析(LSA), 降维, 文本聚类, 聚类分析

Abstract: In order to realize dynamic clustering for high-dimensional text data, an improved ART2 neural network text clustering algorithm based on Latent Semantic Analysis(LSA) is proposed, which emerges the semantic relations between texts and terms and reduces the noises, the dimensionality and the computation complexity by LSA. The new algorithm uses an improved intermediate learning method to simplify calculating procedures and accelerate the computation of the ART2 network, and uses the nearest neighbor reformation to improve the stability and weaken the dependence of samples order for the ART2 network clustering. Experiments demonstrate that this improved algorithm can realize dynamic text clustering effectively.

Key words: ART2 neural network, nearest neighbor, Latent Semantic Analysis(LSA), dimensionality reduction, text clustering, clustering analysis