Computer Engineering and Applications ›› 2013, Vol. 49 ›› Issue (14): 133-137.
Previous Articles Next Articles
YU Qianqian, DAI Yueming
Online:
Published:
虞倩倩,戴月明
Abstract: Fuzzy C-means?is an important?soft-clustering algorithm, but with the increased amount of data the time complexity will be increased. In this paper, a parallel?fuzzy?C-means?algorithm based on?the MapReduce is proposed. The fuzzy?C-means?algorithm is redesigned to meet the MapReduce programming model. The membership degree of data set to the center is computed in parallel, and the new cluster center is re-calculated, so that the higher calculating efficiency of processing large amount of data can be got. The experimental results show that the parallel?fuzzy?C-means?algorithm based on?the MapReduce has the advantages of both high speedup and good scalability.
Key words: fuzzy C-means, parallel computing, MapReduce, data mining, cloud computing
摘要: 模糊C均值是一种重要的软聚类算法,针对模糊C均值的随着数据量的增加,时间复杂度过高的缺点,提出了一种基于MapReduce的并行模糊C均值算法。算法重新设计模糊C均值,使其符合MapReduce的基于key/value的编程模型,并行计算数据集到中心点的隶属度,并重新计算出新的聚类中心,提高了模糊C均值处理大容量数据的计算效率。实验结果表明,基于MapReduce的并行模糊C均值算法具有较高的加速比和扩展性。
关键词: 模糊C均值, 并行计算, MapReduce编程模型, 数据挖掘, 云计算
YU Qianqian, DAI Yueming. Parallel fuzzy C-means algorithm based on MapReduce[J]. Computer Engineering and Applications, 2013, 49(14): 133-137.
虞倩倩,戴月明. 基于MapReduce的并行模糊C均值算法[J]. 计算机工程与应用, 2013, 49(14): 133-137.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/
http://cea.ceaj.org/EN/Y2013/V49/I14/133