Optimized K-means clustering algorithm for massive data

Abstract

Abstract: In order to solve the problem of the clustering on massive data under the framework of a centralized system, an optimized algorithm to K-means clustering based on MapReduce is proposed. By using MapReduce parallel programming framework and importing Canopy clustering, this algorithm optimizes initial clustering center, improves communication mode and calculation mode in iteration. The experimental results show that this algorithm can effectively improve the quality of clustering, and can have higher implementation efficiency, its good scalability, thus it fits to clustering analysis on massive data.

Key words: massive data, clustering, MapReduce, K-means algorithm, Canopy algorithm

摘要： 针对集中式系统框架难以进行海量数据聚类分析的问题，提出基于MapReduce的K-means聚类优化算法。该算法运用MapReduce并行编程框架，引入Canopy聚类，优化K-means算法初始中心的选取，改进迭代过程中通信和计算模式。实验结果表明该算法能够有效地改善聚类质量，具有较高的执行效率以及优良的扩展性，适合用于海量数据的聚类分析。

关键词: 海量数据, 聚类, MapReduce, K-means算法, Canopy算法

JI Suqin, SHI Hongbo. Optimized K-means clustering algorithm for massive data[J]. Computer Engineering and Applications, 2014, 50(14): 143-147.

冀素琴，石洪波. 面向海量数据的K-means聚类优化算法[J]. 计算机工程与应用, 2014, 50(14): 143-147.

[1]	LAN Hong, HUANG Min. Fusion of KNN Optimized Density Peaks and FCM Clustering Algorithm [J]. Computer Engineering and Applications, 2021, 57(9): 81-88.
[2]	GUO Xiaojing, SUI Haoda. Application of Improved YOLOv3 in Foreign Object Debris Target Detection on Airfield Pavement [J]. Computer Engineering and Applications, 2021, 57(8): 249-255.
[3]	LI Li, JI Xinyuan, SONG Song. Prediction Model for Number of Software Defects in Loop [J]. Computer Engineering and Applications, 2021, 57(7): 158-163.
[4]	HUO Guangyu, ZHANG Yong, SUN Yanfeng, YIN Baocai. Research on Archive Data Intelligent Classification Based on Semantic [J]. Computer Engineering and Applications, 2021, 57(6): 247-253.
[5]	YANG Fang, YIN Xi, SI Jianhui, LIU Hongyuan, WANG Xue. Mathematical Expression Similarity Calculation Method Based on Focus Clustering [J]. Computer Engineering and Applications, 2021, 57(6): 88-93.
[6]	ZHAO Fan, ZHANG Lin, WEN Zhiquan, YANG Linlin, LIN Guangfeng. Direct and Efficient Natural Scene Chinese Character Approaching Spotting Method [J]. Computer Engineering and Applications, 2021, 57(6): 159-167.
[7]	PENG Qihui, XUAN Shibin, GAO Qing. Distribution Automatic Threshold Density Peak Clustering Algorithm [J]. Computer Engineering and Applications, 2021, 57(5): 71-78.
[8]	LI Yongzhen, LIAO Husheng. Multi-view Clustering via Graph Convolutional Neural Network [J]. Computer Engineering and Applications, 2021, 57(5): 115-122.
[9]	WANG Changlong, ZHANG Yuandong, MIAO Hong, YANG Yuheng. Application of Double Channel Convolutional Neural Network in Pumpkin Diseases Identification [J]. Computer Engineering and Applications, 2021, 57(5): 183-189.
[10]	HU Xiaomin, WANG Mingfeng, ZHANG Shourong, LI Min. New Differential Evolution with Particle Swarm Optimization Algorithm for Text Clustering [J]. Computer Engineering and Applications, 2021, 57(4): 61-67.
[11]	WANG Junling, LU Xinming. Video Key Frame Extraction Algorithm Based on Semantic Correlation [J]. Computer Engineering and Applications, 2021, 57(4): 192-198.
[12]	WANG Fuyin, ZHANG Desheng, ZHANG Xiao. Adaptive Density Peaks Clustering Algorithm Combining with Whale Optimization Algorithm [J]. Computer Engineering and Applications, 2021, 57(3): 94-102.
[13]	CHEN Junfeng, ZHENG Zhongtuan. Over-Sampling Method on Imbalanced Data Based on WKMeans and SMOTE [J]. Computer Engineering and Applications, 2021, 57(23): 106-112.
[14]	ZHANG Zhonglin, ZHAO Yu, YAN Guanghui. Natural Neighbor Density Extremum Clustering Algorithm [J]. Computer Engineering and Applications, 2021, 57(23): 200-210.
[15]	MEI Jie, WEI Yuanyuan, XU Taosheng. Fusion Clustering Algorithm Based on Multi-Prototypes Using Density Peaks [J]. Computer Engineering and Applications, 2021, 57(22): 78-85.

Optimized K-means clustering algorithm for massive data

面向海量数据的K-means聚类优化算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles 0

Metrics