Computer Engineering and Applications ›› 2008, Vol. 44 ›› Issue (28): 133-135.DOI: 10.3778/j.issn.1002-8331.2008.28.045

• 数据库、信号与信息处理 • Previous Articles     Next Articles

Large data sets clustering analysis based on distribution

JIA Jun-fang,ZHANG Ri-quan   

  1. Mathematics and Computer Institute,Datong University,Datong,Shanxi 037009,China
  • Received:2007-11-20 Revised:2008-02-28 Online:2008-10-01 Published:2008-10-01
  • Contact: JIA Jun-fang



  1. 大同大学 数学与计算机学院,山西 大同 037009
  • 通讯作者: 贾俊芳

Abstract: In order to improve the efficiency we propose a distributed clustering algorithm based on large data sets.Namely data is randomly divided into several subsets without clustering all the data at a time,then we cluster all the subsets at the same time.At last we combine the genus.Experiment results show that most of time the result is the same as using traditional clustering algorithm,and it improves the clustering speed greatly.

Key words: clustering analysis, distribute, large data sets

摘要: 为了提高聚类效率提出了一种基于分布式的大数据集聚类算法。该方法并不是一次性对所有的数据进行聚类,而是将大数据集随机分成若干个子集,对每个子集同时进行聚类,最后进行类的合并。实验结果表明大多数情况下该方法和传统的一次性聚类的结果一致,而且极大地提高了聚类的速度。

关键词: 聚类分析, 分布式, 大数据集