计算机工程与应用 ›› 2013, Vol. 49 ›› Issue (16): 117-120.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

基于MapReduce的ACO-K-means并行聚类算法

虞倩倩,戴月明,李晶晶   

  1. 江南大学 物联网工程学院,江苏 无锡 214122
  • 出版日期:2013-08-15 发布日期:2013-08-15

Parallel ACO-K-means clustering algorithm based on MapReduce

YU Qianqian, DAI Yueming, LI Jingjing   

  1. School of IOT, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2013-08-15 Published:2013-08-15

摘要: 针对K-means算法处理海量数据存在严重的内存不足,提出利用MapReduce并行化K-means,但是普通的K均值存在收敛速度慢、易陷入局部最优和对初始聚类中心的选取等局限性,因此选择了经ACO改进过的ACO-K-means聚类算法。实验结果表明,经MapReduce并行化的ACO-K-means,不仅具有良好的加速比和扩展性,其收敛性以及聚类精度均得到了改善。

关键词: 数据挖掘, MapReduce, 蚁群优化, K-means, 云计算

Abstract: There is a serious lack of memory when use K-means to deal with massive data. In this paper MapReduce is used to parallelize K-means. Due to ordinary K-means has slow rate of convergence, easily fall into local optimization and the limitations of selection of initial cluster centers, Ant Colony Optimization(ACO) is led into K-means. The result demonstrates ACO-K-means clustering algorithm based on MapReduce model has high speedup and good scalability, and it’s convergence and clustering accuracy are also improved in some degree.

Key words: data mining, MapReduce, Ant Colony Optimization(ACO), K-means, cloud computing