计算机工程与应用 ›› 2016, Vol. 52 ›› Issue (1): 66-70.

• 大数据与云计算 • 上一篇    下一篇

基于MapReduce的并行SFLA-FCM聚类算法

苟  杰,马自堂   

  1. 解放军信息工程大学三院,郑州 450000
  • 出版日期:2016-01-01 发布日期:2015-12-30

Parallel SFLA-FCM clustering algorithm based on MapReduce

GOU Jie, MA Zitang   

  1. The Third Institute, PLA Information Engineering University, Zhengzhou 450000, China
  • Online:2016-01-01 Published:2015-12-30

摘要: 模糊C均值算法(Fuzzy C-Means,FCM)是目前应用比较广泛的一种聚类算法。FCM算法的聚类质量依赖于初始聚类中心的选择并且易陷入局部极值,结合混合蛙跳算法(Shuffled Frog Leaping Algorithm,SFLA)较强的搜索能力,提出一种基于MapReduce的并行SFLA-FCM聚类算法。该算法利用SFLA算法的子群内模因信息传递和全局信息交换来搜索高质量的聚类中心,根据MapReduce编程模型设计算法流程,实现并行化,使其具有处理大规模数据集的能力。实验证明,并行SFLA-FCM算法提高了的搜索能力和聚类结果的精度,并且具有良好的加速比和扩展性。

关键词: 聚类, 模糊C均值算法, 混合蛙跳算法, MapReduce

Abstract: Fuzzy C-Means(FCM) algorithm is a kind of widely used clustering algorithm. But the clustering quality of the FCM depends on the choice of initial values. Combined with the better searching performance of the Shuffled Frog Leaping Algorithm(SFLA), this paper presents a parallel SFLA-FCM clustering algorithm based on MapReduce. The algorithm uses the information transmitting within subgroups and global information exchange to search the high quality of the clustering center. The algorithm process is designed to conform to the MapReduce programming model and it has the ability of dealing with large-scale dataset. The experiments prove that parallel SFLA-FCM improves the searching performance and the accuracy of clustering results and has high speedup and scalability.

Key words: clustering, Fuzzy C-Means(FCM), Shuffled Frog Leaping Algorithm(SFLA), MapReduce