计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (28): 133-135.DOI: 10.3778/j.issn.1002-8331.2008.28.045

• 数据库、信号与信息处理 • 上一篇    下一篇

基于分布式的大数据集聚类分析

贾俊芳,张日权   

  1. 大同大学 数学与计算机学院,山西 大同 037009
  • 收稿日期:2007-11-20 修回日期:2008-02-28 出版日期:2008-10-01 发布日期:2008-10-01
  • 通讯作者: 贾俊芳

Large data sets clustering analysis based on distribution

JIA Jun-fang,ZHANG Ri-quan   

  1. Mathematics and Computer Institute,Datong University,Datong,Shanxi 037009,China
  • Received:2007-11-20 Revised:2008-02-28 Online:2008-10-01 Published:2008-10-01
  • Contact: JIA Jun-fang

摘要: 为了提高聚类效率提出了一种基于分布式的大数据集聚类算法。该方法并不是一次性对所有的数据进行聚类,而是将大数据集随机分成若干个子集,对每个子集同时进行聚类,最后进行类的合并。实验结果表明大多数情况下该方法和传统的一次性聚类的结果一致,而且极大地提高了聚类的速度。

关键词: 聚类分析, 分布式, 大数据集

Abstract: In order to improve the efficiency we propose a distributed clustering algorithm based on large data sets.Namely data is randomly divided into several subsets without clustering all the data at a time,then we cluster all the subsets at the same time.At last we combine the genus.Experiment results show that most of time the result is the same as using traditional clustering algorithm,and it improves the clustering speed greatly.

Key words: clustering analysis, distribute, large data sets